CN118447435A

CN118447435A - Analysis method and device for video teaching content

Info

Publication number: CN118447435A
Application number: CN202410671446.0A
Authority: CN
Inventors: 邱雪茵; 罗歆昱; 郭宜纹; 蔡佳峰; 王兆均; 陈崇雨
Original assignee: DMAI Guangzhou Co Ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2024-05-28
Filing date: 2024-05-28
Publication date: 2024-08-06
Anticipated expiration: 2044-05-28
Also published as: CN118447435B

Abstract

The present invention discloses a method and device for analyzing video teaching content, comprising: firstly obtaining a teaching video to be analyzed, and obtaining candidate video frames with timestamps by frame extraction processing. Then, the teaching catalog page, teaching title page and teaching content page are identified from these frames. Finally, a teaching knowledge tree is constructed based on this information as the analysis result of the video. Such a design not only improves the retrievability of the video content, but also helps learners to quickly locate and understand the key knowledge points in the video.

Description

A method and device for analyzing video teaching content

技术领域Technical Field

本发明涉及视频处理技术领域，具体而言，涉及一种针对视频教学内容的分析方法及装置。The present invention relates to the field of video processing technology, and in particular to a method and device for analyzing video teaching content.

背景技术Background technique

随着互联网技术的发展，线上教学视频成为重要的学习资源。然而，随着视频数量的激增，如何有效地对视频内容进行组织、分类和检索成为一个挑战。现有的视频分析方法往往侧重于视频的底层特征提取，如颜色、纹理等，但对于教学内容视频，这些方法难以捕捉到其语义信息，如章节结构、知识点等。因此，开发一种能够针对视频教学内容进行深度分析的方法显得尤为重要。With the development of Internet technology, online teaching videos have become an important learning resource. However, with the surge in the number of videos, how to effectively organize, classify and retrieve video content has become a challenge. Existing video analysis methods often focus on extracting underlying features of videos, such as color, texture, etc., but for teaching content videos, these methods are difficult to capture their semantic information, such as chapter structure, knowledge points, etc. Therefore, it is particularly important to develop a method that can perform in-depth analysis of video teaching content.

发明内容Summary of the invention

本发明的目的在于提供一种针对视频教学内容的分析方法及装置。The purpose of the present invention is to provide a method and device for analyzing video teaching content.

第一方面，本发明实施例提供一种针对视频教学内容的分析方法，包括：In a first aspect, an embodiment of the present invention provides a method for analyzing video teaching content, comprising:

获取待分析教学视频；Get the teaching video to be analyzed;

对所述待分析教学视频进行抽帧处理，得到多个候选教学视频帧，所述候选教学视频帧配置有时间戳；Performing frame extraction processing on the teaching video to be analyzed to obtain a plurality of candidate teaching video frames, wherein the candidate teaching video frames are configured with timestamps;

分别从所述多个候选教学视频帧中，确定出教学目录页、教学标题页以及教学内容页；Determining a teaching catalog page, a teaching title page, and a teaching content page from the plurality of candidate teaching video frames respectively;

根据所述教学目录页、教学标题页以及教学内容页，构建教学知识树作为所述待分析教学视频的分析结果。According to the teaching catalog page, teaching title page and teaching content page, a teaching knowledge tree is constructed as the analysis result of the teaching video to be analyzed.

在本发明实施例中，所述对所述待分析教学视频进行抽帧处理，得到多个候选教学视频帧，包括：In the embodiment of the present invention, the frame extraction process is performed on the teaching video to be analyzed to obtain a plurality of candidate teaching video frames, including:

按照预设时间间隔对所述待分析教学视频进行抽帧处理，得到多个初始教学视频帧；Extract frames from the teaching video to be analyzed at preset time intervals to obtain multiple initial teaching video frames;

对所述多个初始教学视频帧进行去重处理，得到所述多个候选教学视频帧。The multiple initial teaching video frames are deduplicated to obtain the multiple candidate teaching video frames.

在本发明实施例中，所述对所述多个初始教学视频帧进行去重处理，得到所述多个候选教学视频帧，包括：In the embodiment of the present invention, the deduplication processing of the multiple initial teaching video frames to obtain the multiple candidate teaching video frames includes:

计算所述多个初始教学视频帧中两两相邻的初始教学视频帧的结构相似性指数；Calculating the structural similarity index of two adjacent initial teaching video frames in the plurality of initial teaching video frames;

若存在结构相似性指数大于预设相似性指数阈值，则删除位于后序的初始教学视频帧，并重新计算剩余初始教学视频帧中两两相邻的初始教学视频帧的结构相似性指数，直至结束去重处理；If there is a structure similarity index greater than a preset similarity index threshold, the initial teaching video frame located in the latter order is deleted, and the structure similarity indexes of the initial teaching video frames adjacent to each other in the remaining initial teaching video frames are recalculated until the deduplication process is completed;

若不存在结构相似性指数大于预设相似性指数阈值，则结束去重处理；If there is no structure similarity index greater than the preset similarity index threshold, the deduplication process is terminated;

将去多个重后的初始教学视频帧作为所述多个候选教学视频帧。The initial teaching video frames after multiple duplications are removed are used as the multiple candidate teaching video frames.

在本发明实施例中，从所述多个候选教学视频帧中，确定出所述教学目录页，包括：In an embodiment of the present invention, determining the teaching catalog page from the plurality of candidate teaching video frames includes:

提取目标候选教学视频帧的文本内容，所述目标候选教学视频帧为所述多个候选教学视频帧中的任一候选教学视频帧；Extracting text content of a target candidate teaching video frame, wherein the target candidate teaching video frame is any candidate teaching video frame among the multiple candidate teaching video frames;

对所述文本内容进行分析，得到所述文本内容包括的关键词；Analyze the text content to obtain keywords included in the text content;

在所述关键词包括预设目录关键词的情况下，将所述目标候选教学视频对应的内容作为所述教学目录页。In the case where the keyword includes a preset catalog keyword, the content corresponding to the target candidate teaching video is used as the teaching catalog page.

在本发明实施例中，从所述多个候选教学视频帧中，确定出所述教学标题页，包括：In an embodiment of the present invention, determining the teaching title page from the plurality of candidate teaching video frames includes:

获取目标候选教学视频帧的文本字体信息，所述目标候选教学视频帧为所述多个候选教学视频帧中的任一候选教学视频帧，所述文本字体信息包括字体大小、字体位置和关键词信息；Acquire text font information of a target candidate teaching video frame, wherein the target candidate teaching video frame is any candidate teaching video frame among the multiple candidate teaching video frames, and the text font information includes font size, font position and keyword information;

若根据所述字体大小、字体位置和关键词信息，从所述文本字体信息中确定出了标题内容，则将所述目标候选教学视频对应的内容作为所述教学标题页。If the title content is determined from the text font information according to the font size, font position and keyword information, the content corresponding to the target candidate teaching video is used as the teaching title page.

在本发明实施例中，从所述多个候选教学视频帧中，确定出所述教学内容页，包括：In an embodiment of the present invention, determining the teaching content page from the plurality of candidate teaching video frames includes:

将除所述教学目录页和所述教学标题页以外的内容，作为所述教学内容页。The contents other than the teaching catalog page and the teaching title page are regarded as the teaching content page.

在本发明实施例中，所述根据所述教学目录页、教学标题页以及教学内容页，构建教学知识树作为所述待分析教学视频的分析结果，包括：In an embodiment of the present invention, the step of constructing a teaching knowledge tree as an analysis result of the teaching video to be analyzed based on the teaching catalog page, the teaching title page, and the teaching content page includes:

获取所有所述教学目录页的第一关键词，将所述第一关键词作为所述教学知识树的索引依据；Obtaining the first keyword of all the teaching catalog pages, and using the first keyword as the index basis of the teaching knowledge tree;

获取每个所述教学标题页的第二关键词，根据所述第二关键词确定出每个所述教学标题页对应的关键知识点；Acquire a second keyword of each teaching title page, and determine a key knowledge point corresponding to each teaching title page according to the second keyword;

确定每个所述教学内容页与每个所述教学目录页以及每个所述教学标题页之间的关联关系，确定出所述教学目录页、教学标题页以及教学内容页在所述构建教学知识树的位置关系；Determine the association relationship between each of the teaching content pages and each of the teaching catalog pages and each of the teaching title pages, and determine the positional relationship between the teaching catalog pages, teaching title pages and teaching content pages in constructing the teaching knowledge tree;

根据所述索引依据、所述关键知识点、所述位置关系以及所述时间戳，构建出所述教学知识树，所述关键知识点与所述时间戳存在绑定关系。The teaching knowledge tree is constructed according to the index basis, the key knowledge points, the position relationship and the timestamp, and there is a binding relationship between the key knowledge points and the timestamp.

在本发明实施例中，所述方法还包括：In an embodiment of the present invention, the method further includes:

响应于视频帧修改操作，对所述多个候选教学视频帧进行新增、删除、调整顺序中的至少一种操作；In response to the video frame modification operation, performing at least one of adding, deleting, and adjusting the order of the plurality of candidate teaching video frames;

响应于文字调整操作，对所述教学目录页、教学标题页以及教学内容页所包括的文字内容进行修改；In response to the text adjustment operation, modify the text content included in the teaching catalog page, the teaching title page and the teaching content page;

响应于知识树保存操作，将所述教学知识树进行保存，并与所述待分析教学视频的视频标识进行绑定。In response to the knowledge tree saving operation, the teaching knowledge tree is saved and bound to the video identifier of the teaching video to be analyzed.

在本发明实施例中，所述获取待分析教学视频，包括：In the embodiment of the present invention, the step of obtaining the teaching video to be analyzed includes:

与预设教学视频数据库设置同步机制；Set up synchronization mechanism with the preset teaching video database;

在检测到所述预设教学视频数据库存入了新的待处理教学视频时，判断所述待处理教学视频是否为录频演示视频以及视频时长是否超过预设最低时长；When it is detected that a new teaching video to be processed is stored in the preset teaching video database, it is determined whether the teaching video to be processed is a recorded demonstration video and whether the video duration exceeds a preset minimum duration;

若是，则将所述待处理教学视频作为所述待分析教学视频；If yes, the teaching video to be processed is used as the teaching video to be analyzed;

若否，则将所述待处理教学视频从所述预设教学视频数据库中删除。If not, the to-be-processed teaching video is deleted from the preset teaching video database.

第二方面，本发明实施例提供一种针对视频教学内容的分析装置，包括：In a second aspect, an embodiment of the present invention provides an analysis device for video teaching content, comprising:

获取模块，用于获取待分析教学视频；An acquisition module is used to acquire the teaching video to be analyzed;

分析模块，用于对所述待分析教学视频进行抽帧处理，得到多个候选教学视频帧，所述候选教学视频帧配置有时间戳；分别从所述多个候选教学视频帧中，确定出教学目录页、教学标题页以及教学内容页；根据所述教学目录页、教学标题页以及教学内容页，构建教学知识树作为所述待分析教学视频的分析结果。The analysis module is used to extract frames from the teaching video to be analyzed to obtain multiple candidate teaching video frames, each of which is configured with a timestamp; determine a teaching catalog page, a teaching title page, and a teaching content page from the multiple candidate teaching video frames; and construct a teaching knowledge tree as the analysis result of the teaching video to be analyzed based on the teaching catalog page, the teaching title page, and the teaching content page.

相比现有技术，本发明提供的有益效果包括：采用本发明公开的一种针对视频教学内容的分析方法及装置，通过获取待分析的教学视频，通过抽帧处理得到带有时间戳的候选视频帧。然后，从这些帧中识别出教学目录页、教学标题页和教学内容页。最后，基于这些信息构建出一个教学知识树，作为视频的分析结果。如此设计，不仅提高了视频内容的可检索性，还有助于学习者快速定位和理解视频中的关键知识点。Compared with the prior art, the beneficial effects provided by the present invention include: adopting a method and device for analyzing video teaching content disclosed by the present invention, by obtaining the teaching video to be analyzed, and obtaining candidate video frames with timestamps through frame extraction processing. Then, the teaching catalog page, teaching title page and teaching content page are identified from these frames. Finally, a teaching knowledge tree is constructed based on this information as the analysis result of the video. Such a design not only improves the searchability of the video content, but also helps learners quickly locate and understand the key knowledge points in the video.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍。应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定。对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following briefly introduces the drawings required for use in the embodiments. It should be understood that the following drawings only illustrate certain embodiments of the present invention and should not be regarded as limiting the scope. For those of ordinary skill in the art, other related drawings can also be obtained based on these drawings without creative work.

图1为本发明实施例提供的结构示意图；FIG1 is a schematic diagram of a structure provided by an embodiment of the present invention;

图2为本发明实施例提供的结构示意图；FIG2 is a schematic diagram of a structure provided by an embodiment of the present invention;

图3为本发明实施例提供的结构示意图。FIG. 3 is a schematic diagram of a structure provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述。显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Generally, the components of the embodiments of the present invention described and shown in the drawings here can be arranged and designed in various different configurations.

下面结合附图，对本发明的具体实施方式进行详细说明。The specific implementation modes of the present invention are described in detail below in conjunction with the accompanying drawings.

为了解决前述背景技术中的技术问题，图1为本公开实施例提供的针对视频教学内容的分析方法的流程示意图，下面对该针对视频教学内容的分析方法进行详细介绍。In order to solve the technical problems in the aforementioned background technology, FIG1 is a flow chart of an analysis method for video teaching content provided by an embodiment of the present disclosure. The following is a detailed introduction to the analysis method for video teaching content.

步骤S201，获取待分析教学视频；Step S201, obtaining a teaching video to be analyzed;

步骤S202，对所述待分析教学视频进行抽帧处理，得到多个候选教学视频帧，所述候选教学视频帧配置有时间戳；Step S202, extracting frames from the teaching video to be analyzed to obtain a plurality of candidate teaching video frames, wherein the candidate teaching video frames are configured with timestamps;

步骤S203，分别从所述多个候选教学视频帧中，确定出教学目录页、教学标题页以及教学内容页；Step S203, determining a teaching catalog page, a teaching title page, and a teaching content page from the plurality of candidate teaching video frames respectively;

步骤S204，根据所述教学目录页、教学标题页以及教学内容页，构建教学知识树作为所述待分析教学视频的分析结果。Step S204, constructing a teaching knowledge tree as the analysis result of the teaching video to be analyzed according to the teaching catalog page, the teaching title page and the teaching content page.

在本发明实施例中，示例性的，服务器从视频库中检索并获取了一段待分析的教学视频。这段视频是一节关于“数学中的概率论基础”的在线课程录像，时长约为45分钟，格式为MP4。服务器使用专业的视频处理软件对这段教学视频进行抽帧处理。抽帧的频率设置为每秒一帧，以确保能够捕捉到视频中的所有关键信息。经过抽帧处理，服务器得到了2700个带有时间戳的候选教学视频帧。服务器运行图像识别算法，对这2700个候选教学视频帧进行逐一分析。通过分析，服务器确定了以下几个关键帧：在时间戳为00:03的帧中，识别到了清晰的教学目录页，列出了本节课的主要内容和章节安排。在时间戳为00:10的帧中，识别到了教学标题页，上面清晰地写着“数学中的概率论基础”这一标题。在时间戳为05:20、10:40、15:50等多个时间点的帧中，识别到了包含具体数学公式和例题的教学内容页。服务器根据识别到的教学目录页、教学标题页和教学内容页，开始构建教学知识树。首先，以“数学中的概率论基础”为根节点，然后根据教学目录页的内容，添加各个章节作为子节点。接着，根据识别到的教学内容页，将具体的数学公式、概念、例题等添加到对应章节的子节点下。最终，服务器构建了一个完整的教学知识树，清晰地展示了这节课的知识结构和内容层次。这个教学知识树不仅可以帮助学习者更好地理解课程内容，还可以作为教师备课和教学评估的参考依据。In an embodiment of the present invention, illustratively, the server retrieves and obtains a teaching video to be analyzed from the video library. This video is an online course recording on "Basics of Probability Theory in Mathematics", which is about 45 minutes long and in MP4 format. The server uses professional video processing software to extract frames for this teaching video. The frequency of extracting frames is set to one frame per second to ensure that all key information in the video can be captured. After the frame extraction process, the server obtains 2700 candidate teaching video frames with timestamps. The server runs an image recognition algorithm to analyze these 2700 candidate teaching video frames one by one. Through analysis, the server determines the following key frames: In the frame with a timestamp of 00:03, a clear teaching catalog page is identified, which lists the main content and chapter arrangement of this lesson. In the frame with a timestamp of 00:10, a teaching title page is identified, which clearly reads the title "Basics of Probability Theory in Mathematics". In frames with timestamps of 05:20, 10:40, 15:50 and other time points, teaching content pages containing specific mathematical formulas and examples were identified. The server began to build a teaching knowledge tree based on the identified teaching catalog page, teaching title page and teaching content page. First, take "Basics of Probability Theory in Mathematics" as the root node, and then add each chapter as a child node based on the content of the teaching catalog page. Then, based on the identified teaching content page, add specific mathematical formulas, concepts, examples, etc. to the child nodes of the corresponding chapters. Finally, the server built a complete teaching knowledge tree that clearly shows the knowledge structure and content hierarchy of this lesson. This teaching knowledge tree can not only help learners better understand the course content, but also serve as a reference for teachers to prepare lessons and evaluate teaching.

在本发明实施例中，前述步骤S202可以通过以下示例执行实施。In the embodiment of the present invention, the aforementioned step S202 may be implemented through the following example execution.

在本发明实施例中，示例性的，在之前的场景中，服务器已经获取了关于“数学中的概率论基础”的在线课程录像。现在，服务器将按照预设的时间间隔，例如每5秒钟，对这个45分钟的教学视频进行抽帧处理。这意味着，在整个视频播放过程中，服务器会每隔5秒抽取一帧，作为初始教学视频帧。因此，在这个45分钟(即2700秒)的视频中，服务器会抽取大约540个初始教学视频帧。在抽取的540个初始教学视频帧中，可能会存在一些重复的帧，尤其是当视频中存在静止画面或者教师讲解时切换幻灯片的情况。为了提高后续分析的准确性和效率，服务器会对这些初始教学视频帧进行去重处理。具体来说，服务器会运行图像相似度算法，比较相邻的初始教学视频帧之间的相似度。如果发现两个相邻的帧高度相似(例如，相似度超过95％)，则视为重复帧，并将其中一帧删除。通过这个过程，服务器能够去除冗余的、重复的初始教学视频帧，从而得到一组精简的、不重复的候选教学视频帧。例如，在原始的540个初始教学视频帧中，经过去重处理后，可能剩余400个左右的候选教学视频帧。这些候选教学视频帧将作为后续步骤(如确定教学目录页、教学标题页以及教学内容页等)的基础数据。通过这样的抽帧和去重处理，服务器能够更有效地对教学视频进行分析，提取出关键的教学信息，为构建准确的教学知识树提供坚实的基础。In an embodiment of the present invention, illustratively, in the previous scenario, the server has obtained an online course recording on "Basics of Probability Theory in Mathematics". Now, the server will extract frames for this 45-minute teaching video at a preset time interval, for example, every 5 seconds. This means that during the entire video playback process, the server will extract a frame every 5 seconds as the initial teaching video frame. Therefore, in this 45-minute (i.e., 2700 seconds) video, the server will extract about 540 initial teaching video frames. Among the 540 extracted initial teaching video frames, there may be some duplicate frames, especially when there are still pictures in the video or when the teacher switches slides during the explanation. In order to improve the accuracy and efficiency of subsequent analysis, the server will perform deduplication processing on these initial teaching video frames. Specifically, the server will run an image similarity algorithm to compare the similarity between adjacent initial teaching video frames. If two adjacent frames are found to be highly similar (for example, the similarity exceeds 95%), they are considered to be duplicate frames and one of them is deleted. Through this process, the server can remove redundant and repeated initial teaching video frames, thereby obtaining a set of streamlined and non-repetitive candidate teaching video frames. For example, out of the original 540 initial teaching video frames, after deduplication processing, there may be about 400 candidate teaching video frames remaining. These candidate teaching video frames will serve as the basic data for subsequent steps (such as determining the teaching catalog page, teaching title page, and teaching content page, etc.). Through such frame extraction and deduplication processing, the server can more effectively analyze the teaching video, extract key teaching information, and provide a solid foundation for building an accurate teaching knowledge tree.

在本发明实施例中，前述对所述多个初始教学视频帧进行去重处理，得到所述多个候选教学视频帧的步骤，可以通过以下示例执行实施。In the embodiment of the present invention, the aforementioned step of performing deduplication processing on the multiple initial teaching video frames to obtain the multiple candidate teaching video frames can be implemented through the following example.

在本发明实施例中，示例性的，在之前抽帧处理后，服务器得到了540个初始教学视频帧。现在，服务器开始计算这些初始教学视频帧中两两相邻帧之间的结构相似性指数(SSIM)。SSIM是一种衡量两个图像相似度的指标，考虑了亮度、对比度和结构三个方面。例如，服务器首先计算第1帧和第2帧之间的SSIM，然后是第2帧和第3帧之间的SSIM，以此类推，直到计算完最后一对相邻帧(第539帧和第540帧)的SSIM。服务器设定了一个相似性指数阈值，比如0.98，用于判断两个相邻帧是否过于相似。在计算完所有相邻帧的SSIM后，服务器检查这些SSIM值是否超过了设定的阈值。假设在计算过程中发现第20帧和第21帧之间的SSIM值为0.99，超过了设定的阈值0.98，那么服务器会判断这两帧过于相似，决定删除其中的一帧。在这个例子中，服务器选择删除后序的第21帧。在删除了第21帧后，原来的第22帧现在变成了新的第21帧。服务器需要重新计算新的第20帧和第21帧之间的SSIM值，以确保去重处理的准确性。这个过程会持续进行，直到没有相邻帧的SSIM值超过设定的阈值为止。如果在某一轮计算后，服务器发现所有相邻帧的SSIM值都没有超过设定的阈值，那么它会判断当前的初始教学视频帧集合已经足够去重，此时会结束去重处理。经过上述的去重处理后，原来的540个初始教学视频帧可能减少到了400个或更少的去重后的初始教学视频帧。这些去重后的初始教学视频帧现在被称为“候选教学视频帧”，它们将用于后续的教学内容分析，如确定教学目录页、教学标题页以及教学内容页等。通过这样的去重处理流程，服务器能够确保候选教学视频帧集合中的每一帧都包含独特且有价值的教学信息，为后续的教学内容分析提供了高质量的数据基础。In an embodiment of the present invention, illustratively, after the previous frame extraction process, the server obtains 540 initial teaching video frames. Now, the server starts to calculate the structural similarity index (SSIM) between two adjacent frames in these initial teaching video frames. SSIM is an indicator to measure the similarity between two images, taking into account three aspects: brightness, contrast and structure. For example, the server first calculates the SSIM between the 1st frame and the 2nd frame, then the SSIM between the 2nd frame and the 3rd frame, and so on, until the SSIM of the last pair of adjacent frames (the 539th frame and the 540th frame) is calculated. The server sets a similarity index threshold, such as 0.98, to determine whether two adjacent frames are too similar. After calculating the SSIM of all adjacent frames, the server checks whether these SSIM values exceed the set threshold. Assuming that the SSIM value between the 20th frame and the 21st frame is 0.99 during the calculation process, which exceeds the set threshold of 0.98, the server will judge that the two frames are too similar and decide to delete one of them. In this example, the server chooses to delete the 21st frame in the subsequent sequence. After deleting the 21st frame, the original 22nd frame now becomes the new 21st frame. The server needs to recalculate the SSIM value between the new 20th frame and the 21st frame to ensure the accuracy of the deduplication process. This process will continue until the SSIM value of no adjacent frames exceeds the set threshold. If after a certain round of calculation, the server finds that the SSIM values of all adjacent frames do not exceed the set threshold, then it will determine that the current initial teaching video frame set is sufficient for deduplication, and the deduplication process will end at this time. After the above deduplication process, the original 540 initial teaching video frames may be reduced to 400 or less initial teaching video frames after deduplication. These initial teaching video frames after deduplication are now called "candidate teaching video frames", which will be used for subsequent teaching content analysis, such as determining the teaching catalog page, teaching title page, and teaching content page. Through such a deduplication process, the server can ensure that each frame in the candidate teaching video frame set contains unique and valuable teaching information, providing a high-quality data foundation for subsequent teaching content analysis.

在本发明实施例中，前述从所述多个候选教学视频帧中，确定出所述教学目录页的步骤，可以通过以下示例执行实施。In the embodiment of the present invention, the aforementioned step of determining the teaching catalog page from the multiple candidate teaching video frames can be implemented through the following example.

在本发明实施例中，示例性的，服务器已经从待分析教学视频中抽帧并去重得到了多个候选教学视频帧。现在，服务器开始从这些候选教学视频帧中确定教学目录页。首先，服务器选择了一个目标候选教学视频帧，这个帧是在讲解开始时教师展示的幻灯片，很可能是教学目录页。服务器使用OCR(光学字符识别)技术来提取这个帧中的文本内容。例如，OCR技术识别出了帧中的文字：“第一章引言”、“第二章概率论基础”、“第三章随机变量及其分布”等。服务器对提取出的文本内容进行分析，以获取关键词。在这个过程中，服务器可能会使用自然语言处理技术，如词频分析、TF-IDF等方法来提取关键词。在我们的例子中，服务器识别出的关键词可能包括“引言”、“概率论基础”、“随机变量”等，这些关键词都是与教学内容紧密相关的。服务器有一个预设的目录关键词列表，这个列表可能包括“章”、“节”、“部分”等常见的目录结构词汇。现在，服务器将提取出的关键词与这个预设列表进行对比。在我们的例子中，“章”是一个预设的目录关键词，而服务器从文本内容中提取出的关键词确实包括了“章”，这表明这个候选教学视频帧很可能是一个教学目录页。由于提取出的关键词与预设的目录关键词匹配，服务器判断这个候选教学视频帧是教学目录页。因此，它将这个帧的内容保存为教学目录页，以便后续使用。通过这个流程，服务器能够准确地从多个候选教学视频帧中确定出教学目录页，为后续的教学内容分析和知识树构建提供了重要的信息。In the embodiment of the present invention, illustratively, the server has extracted frames from the teaching video to be analyzed and removed duplicates to obtain multiple candidate teaching video frames. Now, the server begins to determine the teaching catalog page from these candidate teaching video frames. First, the server selects a target candidate teaching video frame, which is the slide shown by the teacher at the beginning of the lecture, and is likely to be a teaching catalog page. The server uses OCR (optical character recognition) technology to extract the text content in this frame. For example, OCR technology recognizes the text in the frame: "Chapter 1 Introduction", "Chapter 2 Probability Theory Foundation", "Chapter 3 Random Variables and Their Distribution", etc. The server analyzes the extracted text content to obtain keywords. In this process, the server may use natural language processing technology, such as word frequency analysis, TF-IDF and other methods to extract keywords. In our example, the keywords identified by the server may include "Introduction", "Probability Theory Foundation", "Random Variables", etc., which are closely related to the teaching content. The server has a preset directory keyword list, which may include common directory structure words such as "chapter", "section", and "part". Now, the server compares the extracted keywords with this preset list. In our example, "chapter" is a preset catalog keyword, and the keywords extracted by the server from the text content do include "chapter", which indicates that this candidate teaching video frame is likely to be a teaching catalog page. Since the extracted keywords match the preset catalog keywords, the server determines that this candidate teaching video frame is a teaching catalog page. Therefore, it saves the content of this frame as a teaching catalog page for subsequent use. Through this process, the server can accurately determine the teaching catalog page from multiple candidate teaching video frames, providing important information for subsequent teaching content analysis and knowledge tree construction.

在本发明实施例中，前述从所述多个候选教学视频帧中，确定出教学标题页的步骤，可以通过以下示例执行实施。In the embodiment of the present invention, the aforementioned step of determining the teaching title page from the plurality of candidate teaching video frames can be implemented through the following example.

在本发明实施例中，示例性的，服务器继续处理之前从教学视频中抽帧并去重后得到的候选教学视频帧。为了确定教学标题页，服务器选择一个目标候选教学视频帧进行分析。这个帧可能是在教师开始正式讲解之前展示的一张包含课程标题的幻灯片。服务器使用OCR技术和图像处理技术来提取这个帧中的文本字体信息。这些信息包括字体大小、字体位置和关键词等。例如，OCR技术可能识别出帧中有一行大字体的文字：“数学中的概率论基础”，同时识别出这行文字的字体大小明显大于其他文字，且位于幻灯片的中心位置。服务器根据提取出的文本字体信息进行分析。首先，它会关注那些字体大小明显大于周围文本的文字，因为这些文字很可能是标题。其次，服务器会考虑文本的位置，通常标题会被放置在幻灯片的显著位置，如中心或顶部。最后，服务器还会结合关键词信息来判断，例如某些特定的词汇或短语可能更倾向于被认为是标题的一部分。在我们的例子中，“数学中的概率论基础”这行文字由于字体大且位于中心位置，很可能被识别为标题内容。基于上述分析，服务器判断这个候选教学视频帧中包含的“数学中的概率论基础”是教学标题。因此，它将这个帧的内容保存为教学标题页，以便后续使用。通过这个流程，服务器能够准确地从多个候选教学视频帧中确定出教学标题页，这为构建教学知识树和后续的教学内容分析提供了关键信息。In the embodiment of the present invention, exemplarily, the server continues to process the candidate teaching video frames obtained after extracting frames from the teaching video and removing duplicates. In order to determine the teaching title page, the server selects a target candidate teaching video frame for analysis. This frame may be a slide containing the course title displayed before the teacher starts the formal explanation. The server uses OCR technology and image processing technology to extract the text font information in this frame. This information includes font size, font position and keywords. For example, OCR technology may recognize that there is a line of large font text in the frame: "Basics of Probability Theory in Mathematics", and at the same time recognize that the font size of this line of text is significantly larger than other text and is located in the center of the slide. The server analyzes based on the extracted text font information. First, it will focus on those texts whose font size is significantly larger than the surrounding text, because these texts are likely to be titles. Secondly, the server will consider the position of the text, and the title will usually be placed in a prominent position on the slide, such as the center or top. Finally, the server will also combine keyword information to make a judgment, for example, certain specific words or phrases may be more likely to be considered as part of the title. In our example, the line of text "Basics of Probability Theory in Mathematics" is likely to be recognized as title content because of its large font and location in the center. Based on the above analysis, the server determines that the "Basics of Probability Theory in Mathematics" contained in this candidate teaching video frame is a teaching title. Therefore, it saves the content of this frame as a teaching title page for subsequent use. Through this process, the server can accurately determine the teaching title page from multiple candidate teaching video frames, which provides key information for constructing a teaching knowledge tree and subsequent teaching content analysis.

在本发明实施例中，前述从所述多个候选教学视频帧中，确定出所述教学内容页的步骤，可以通过以下示例执行实施。In the embodiment of the present invention, the aforementioned step of determining the teaching content page from the multiple candidate teaching video frames can be implemented through the following example.

在本发明实施例中，示例性的，在之前的步骤中，服务器已经从多个候选教学视频帧中成功识别并提取了教学目录页和教学标题页。现在，服务器需要确定教学内容页。教学内容页通常包含课程的具体知识点、实例、讲解等，是教学视频中除了目录页和标题页之外的所有内容。服务器首先排除已经确定为教学目录页和教学标题页的候选教学视频帧。这些帧已经被识别为特定的教学结构元素，因此不属于教学内容页。接下来，服务器将剩余的候选教学视频帧归类为教学内容页。这些帧包含了教师讲解的具体内容，如公式推导、案例分析、实验操作等。例如，在一个关于“数学中的概率论基础”的在线课程中，服务器已经识别了教学目录页和教学标题页。剩下的候选教学视频帧中，有一帧展示了教师正在讲解一个具体的概率计算公式，还有一帧是教师在展示一个概率问题的解题步骤。这些帧都被服务器确定为教学内容页，因为它们包含了具体的教学知识点和讲解内容。通过这个步骤，服务器能够准确地从多个候选教学视频帧中确定出教学内容页，这为后续的教学内容深入分析和知识提取提供了基础数据。In an embodiment of the present invention, exemplarily, in the previous steps, the server has successfully identified and extracted the teaching catalog page and the teaching title page from multiple candidate teaching video frames. Now, the server needs to determine the teaching content page. The teaching content page usually contains specific knowledge points, examples, explanations, etc. of the course, which is all the content in the teaching video except the catalog page and the title page. The server first excludes the candidate teaching video frames that have been determined as teaching catalog pages and teaching title pages. These frames have been identified as specific teaching structural elements and therefore do not belong to the teaching content page. Next, the server classifies the remaining candidate teaching video frames as teaching content pages. These frames contain the specific content explained by the teacher, such as formula derivation, case analysis, experimental operation, etc. For example, in an online course on "Basics of Probability Theory in Mathematics", the server has identified the teaching catalog page and the teaching title page. Among the remaining candidate teaching video frames, one frame shows that the teacher is explaining a specific probability calculation formula, and another frame is that the teacher is showing the steps to solve a probability problem. These frames are determined by the server as teaching content pages because they contain specific teaching knowledge points and explanation content. Through this step, the server can accurately determine the teaching content page from multiple candidate teaching video frames, which provides basic data for subsequent in-depth analysis of the teaching content and knowledge extraction.

在本发明实施例中，前述步骤S204可以通过以下示例执行实施。In the embodiment of the present invention, the aforementioned step S204 may be implemented through the following example execution.

在本发明实施例中，示例性的，服务器已经确定了教学目录页，现在它从这些页面中提取第一关键词。例如，在一个关于“编程基础”的课程中，教学目录页可能包含多个章节标题，如“第一章：编程语言概述”、“第二章：变量与数据类型”等。服务器将这些章节标题作为第一关键词，它们将成为构建教学知识树时的索引依据。接下来，服务器从每个教学标题页中提取第二关键词。这些关键词代表了每个标题页所讲解的核心知识点。例如，在“第二章：变量与数据类型”下，一个教学标题页可能是“2.1变量的定义与声明”，服务器将“变量的定义与声明”作为第二关键词，这表示该页讲解的关键知识点。服务器进一步分析教学内容页，确定它们与教学目录页和教学标题页的关联关系。这通常通过比对文本内容、关键词或时间戳来实现。例如，一个教学内容页详细讲解了“变量的定义与声明”，则它会被关联到对应的教学标题页“2.1变量的定义与声明”，进而也关联到教学目录页“第二章：变量与数据类型”。最后，服务器根据收集到的信息构建教学知识树。树的根节点是课程名称，如“编程基础”。树的第二层节点是教学目录页的第一关键词，即各章节标题。第三层节点是教学标题页的第二关键词，代表各小节或具体知识点。树叶节点则是与这些知识点相关的教学内容页，它们通过关联关系被放置在树中的正确位置。此外，每个节点(特别是关键知识点节点)都与时间戳存在绑定关系。这意味着当用户点击某个知识点时，服务器可以根据时间戳直接定位到教学视频中的对应时间段，方便用户快速找到和学习特定内容。通过这个流程，服务器构建了一个结构化的教学知识树，它清晰地展示了课程内容的层次结构和知识点之间的关系，大大提高了用户的学习效率和体验。In an embodiment of the present invention, illustratively, the server has determined the teaching catalog pages, and now it extracts the first keyword from these pages. For example, in a course about "Programming Basics", the teaching catalog page may contain multiple chapter titles, such as "Chapter 1: Overview of Programming Languages", "Chapter 2: Variables and Data Types", etc. The server uses these chapter titles as the first keyword, and they will become the index basis when constructing the teaching knowledge tree. Next, the server extracts the second keyword from each teaching title page. These keywords represent the core knowledge points explained by each title page. For example, under "Chapter 2: Variables and Data Types", a teaching title page may be "2.1 Definition and Declaration of Variables", and the server uses "Definition and Declaration of Variables" as the second keyword, which indicates the key knowledge points explained by the page. The server further analyzes the teaching content pages to determine their association with the teaching catalog pages and teaching title pages. This is usually achieved by comparing text content, keywords or timestamps. For example, if a teaching content page explains "Definition and Declaration of Variables" in detail, it will be associated with the corresponding teaching title page "2.1 Definition and Declaration of Variables", and then also associated with the teaching catalog page "Chapter 2: Variables and Data Types". Finally, the server constructs a teaching knowledge tree based on the collected information. The root node of the tree is the course name, such as "Programming Basics". The second-level node of the tree is the first keyword of the teaching catalog page, that is, the title of each chapter. The third-level node is the second keyword of the teaching title page, representing each subsection or specific knowledge point. The leaf nodes are the teaching content pages related to these knowledge points, and they are placed in the correct position in the tree through association relationships. In addition, each node (especially the key knowledge point nodes) is bound to a timestamp. This means that when a user clicks on a certain knowledge point, the server can directly locate the corresponding time period in the teaching video according to the timestamp, which is convenient for users to quickly find and learn specific content. Through this process, the server constructs a structured teaching knowledge tree, which clearly shows the hierarchy of the course content and the relationship between knowledge points, greatly improving the user's learning efficiency and experience.

在本发明实施例中，还提供以下实施方式。In the embodiments of the present invention, the following implementation modes are also provided.

在本发明实施例中，示例性的，服务器接收到一个视频帧修改操作请求，该请求来自于一个教学平台的运营管理人员，他希望通过调整候选教学视频帧来完善之前自动生成的教学知识树。具体来说，教学平台的运营管理人员发现某个关键的教学内容页被遗漏了，因此他希望通过新增操作将其加入到教学知识树中。服务器根据教学平台的运营管理人员的请求，在多个候选教学视频帧中新增了遗漏的教学内容页，并重新进行了教学知识树的构建，确保新的教学内容页被正确地添加到了树中。随后，服务器又接收到了一个文字调整操作请求。这次，教学平台的运营管理人员发现某个教学标题页的文字内容有误，需要进行修改。他通过文字调整操作，将错误的标题内容更正为正确的标题。服务器根据教学平台的运营管理人员的修改，对教学标题页的文字内容进行了更新，并重新构建了教学知识树，以确保所有的文字内容都是准确无误的。在完成上述修改后，教学平台的运营管理人员对修改后的教学知识树表示满意，并希望将其保存下来以便后续使用。因此，他发起了一个知识树保存操作。服务器响应这个保存操作，将修改后的教学知识树进行了保存，并与对应的待分析教学视频的视频标识进行了绑定。这样，每当这个教学视频被访问时，服务器都会提供与之绑定的最新版本的教学知识树。通过以上步骤，服务器不仅能够根据用户的反馈对教学知识树进行灵活的修改和完善，还能够确保每次提供的教学知识树都是与对应教学视频内容相匹配的最新版本。In an embodiment of the present invention, exemplarily, the server receives a video frame modification operation request, which comes from an operation manager of a teaching platform, and he hopes to improve the teaching knowledge tree automatically generated before by adjusting the candidate teaching video frame. Specifically, the operation manager of the teaching platform finds that a key teaching content page is omitted, so he hopes to add it to the teaching knowledge tree through a new operation. According to the request of the operation manager of the teaching platform, the server adds the omitted teaching content page in multiple candidate teaching video frames, and reconstructs the teaching knowledge tree to ensure that the new teaching content page is correctly added to the tree. Subsequently, the server receives another text adjustment operation request. This time, the operation manager of the teaching platform finds that the text content of a certain teaching title page is wrong and needs to be modified. He corrects the wrong title content to the correct title through the text adjustment operation. The server updates the text content of the teaching title page according to the modification of the operation manager of the teaching platform, and reconstructs the teaching knowledge tree to ensure that all the text contents are accurate. After completing the above modification, the operation manager of the teaching platform is satisfied with the modified teaching knowledge tree and hopes to save it for subsequent use. Therefore, he initiated a knowledge tree save operation. The server responded to this save operation, saved the modified teaching knowledge tree, and bound it to the video identifier of the corresponding teaching video to be analyzed. In this way, whenever this teaching video is accessed, the server will provide the latest version of the teaching knowledge tree bound to it. Through the above steps, the server can not only flexibly modify and improve the teaching knowledge tree according to the user's feedback, but also ensure that the teaching knowledge tree provided each time is the latest version that matches the corresponding teaching video content.

在本发明实施例中，前述步骤S201可以通过以下示例执行实施。In the embodiment of the present invention, the aforementioned step S201 can be implemented through the following example execution.

在本发明实施例中，示例性的，服务器首先与预设的教学视频数据库建立同步机制。这意味着每当数据库中有新的视频文件存入时，服务器都会立即得到通知。例如，某个教育机构有一个专门用于存储教学视频的数据库，每当教师上传新的教学视频时，该数据库都会通过同步机制通知服务器有新的视频文件存入。当服务器接收到数据库的通知后，它会立即检测新存入的待处理教学视频。这一步是为了确认视频是否符合进一步分析的标准。例如，服务器可能会检测到一个新上传的名为“Python编程基础”的视频文件。服务器接下来会判断这个新视频是否为录频演示视频，并且视频时长是否超过预设的最低时长。录频演示视频通常包含教师的讲解和具体的操作演示，是构建教学知识树的重要资源。同时，视频时长也是一个重要的判断标准，因为过短的视频可能不包含足够的教学内容。例如，服务器可能会判断“Python编程基础”这个视频是一个录频演示视频，并且时长为1小时20分钟，超过了预设的最低时长标准(如30分钟)。如果待处理教学视频满足上述条件，即既是录频演示视频又超过了预设最低时长，服务器就会将其标记为待分析教学视频。在我们的例子中，“Python编程基础”视频会被服务器选定为待分析教学视频。如果待处理教学视频不符合上述条件，服务器会将其从预设教学视频数据库中删除。例如，如果上传的视频时长过短或者不是录频演示视频，服务器就会执行删除操作，以节省存储空间并保证后续分析的准确性。通过以上步骤，服务器能够有效地筛选出符合分析要求的录频演示视频，为后续的教学知识树构建提供高质量的数据基础。In an embodiment of the present invention, illustratively, the server first establishes a synchronization mechanism with a preset teaching video database. This means that whenever a new video file is stored in the database, the server will be notified immediately. For example, an educational institution has a database specifically for storing teaching videos. Whenever a teacher uploads a new teaching video, the database will notify the server through a synchronization mechanism that a new video file is stored. When the server receives the notification from the database, it will immediately detect the newly stored teaching video to be processed. This step is to confirm whether the video meets the criteria for further analysis. For example, the server may detect a newly uploaded video file named "Python Programming Basics". The server will then determine whether the new video is a recorded demonstration video and whether the video duration exceeds the preset minimum duration. The recorded demonstration video usually contains the teacher's explanation and specific operation demonstration, and is an important resource for building a teaching knowledge tree. At the same time, the video duration is also an important judgment criterion, because a video that is too short may not contain enough teaching content. For example, the server may determine that the video "Python Programming Basics" is a recorded demonstration video, and the duration is 1 hour and 20 minutes, which exceeds the preset minimum duration standard (such as 30 minutes). If the teaching video to be processed meets the above conditions, that is, it is a recorded demonstration video and exceeds the preset minimum duration, the server will mark it as a teaching video to be analyzed. In our example, the "Python Programming Basics" video will be selected by the server as the teaching video to be analyzed. If the teaching video to be processed does not meet the above conditions, the server will delete it from the preset teaching video database. For example, if the uploaded video is too short or is not a recorded demonstration video, the server will perform a deletion operation to save storage space and ensure the accuracy of subsequent analysis. Through the above steps, the server can effectively screen out recorded demonstration videos that meet the analysis requirements, providing a high-quality data foundation for the subsequent construction of the teaching knowledge tree.

为了能够更加清楚的描述本发明实施例的方案，下面提供本发明实施例的一种更为完整的实施方式。In order to more clearly describe the solution of the embodiment of the present invention, a more complete implementation of the embodiment of the present invention is provided below.

一、获取待分析视频：1. Get the video to be analyzed:

用户可以通过两种方式获得待分析的视频。Users can obtain the video to be analyzed in two ways.

一是直接上传本地视频文件；One is to directly upload local video files;

二是通过网络传输方式在线提交视频。The second is to submit the video online through network transmission.

为确保分析的准确性和有效性，所提交的视频需满足特定条件，如必须是含PPT类的录屏视频，排除讲座和宣传类视频，以确保可以准确捕捉到教学信息。视频内容应为教学相关，时长建议在5至45分钟之间，但也可适应更长时长的视频。To ensure the accuracy and effectiveness of the analysis, the submitted videos must meet certain conditions, such as screen recordings containing PPTs, excluding lectures and promotional videos, to ensure that the teaching information can be accurately captured. The video content should be teaching-related, and the recommended length is between 5 and 45 minutes, but longer videos can also be accommodated.

二、抽帧识别PPT页面截图：2. Screenshot of PPT page with frame extraction and recognition:

系统使用ffmpeg工具对视频进行抽帧操作，按照预设的相同时间间隔对视频进行抽帧操作，生成一系列带有时间戳的静态视频帧。The system uses the ffmpeg tool to extract frames from the video at the same preset time interval to generate a series of static video frames with timestamps.

抽帧是通过等时间间隔采样的方式从视频中提取帧。假设视频总长度为L秒，抽取间隔为t秒，则总共抽取的帧数N可通过以下公式计算：Frame extraction is to extract frames from the video by sampling at equal time intervals. Assuming the total length of the video is L seconds and the extraction interval is t seconds, the total number of frames extracted N can be calculated by the following formula:

其中，表示向下取整。去重是通过计算连续帧之间的视觉差异来实现的。视觉差异可以通过计算连续帧之间的结构相似性指数(SSIM)来衡量。SSIM是一种衡量两幅图像视觉相似度的指标，其值介于-1到1之间，1表示完全相同。设定一个阈值τ(例如0.85)，当SSIM大于τ时，认为两帧相似并去除后续帧。in, Denotes rounding down. Deduplication is achieved by calculating the visual difference between consecutive frames. The visual difference can be measured by calculating the structural similarity index (SSIM) between consecutive frames. SSIM is a measure of the visual similarity between two images, and its value ranges from -1 to 1, with 1 indicating that they are exactly the same. A threshold τ (for example, 0.85) is set. When SSIM is greater than τ, the two frames are considered similar and the subsequent frames are removed.

抽帧完成后应该会存在较多相似的内容，通过图像相似度算法，如结构相似性指数(SSIM)或平均哈希(aHash)，对连续的帧进行比较，去除相似度高的重复帧，以减少后续计算负担。After the frame extraction is completed, there should be more similar content. Through image similarity algorithms, such as structural similarity index (SSIM) or average hashing (aHash), continuous frames are compared and repeated frames with high similarity are removed to reduce the subsequent calculation burden.

三、自动识别目录页：3. Automatically identify the catalog page:

利用OCR技术，系统从关键帧中自动识别出目录页。通常，目录页包含的是整个演示文稿的大纲或章节标题。可以采用文本分析的方式，通过OCR(光学字符识别)技术提取关键帧中的文本内容，并分析这些文本以查找可能指示目录页的关键词，如“目录”、“大纲”、“章节”等。Using OCR technology, the system automatically identifies the table of contents from the keyframes. Usually, the table of contents contains the outline or chapter titles of the entire presentation. Text analysis can be used to extract the text content in the keyframes through OCR (Optical Character Recognition) technology and analyze the text to find keywords that may indicate a table of contents, such as "table of contents", "outline", "chapter", etc.

若抽帧过程捕获到多个相似的目录页，系统依据置信度评分保留最清晰(置信度最高)的一张目录页。OCR过程中，将图像转换为灰度色彩空间，并进行必要的降噪处理，应用文字检测算法定位文本区域，最后采用Tesseract等OCR引擎对检测到的文字块进行识别。对于识别出的文字，系统进一步执行关键词提取和语义分析，以便为后续的知识抽取和视频索引提供支持。若出现识别错误或漏识别的情况，用户可手动介入，标注并修正主目录页的内容。If the frame extraction process captures multiple similar catalog pages, the system retains the clearest (highest confidence) catalog page based on the confidence score. During the OCR process, the image is converted to a grayscale color space, and necessary noise reduction processing is performed. The text detection algorithm is applied to locate the text area, and finally the detected text blocks are recognized using OCR engines such as Tesseract. For the recognized text, the system further performs keyword extraction and semantic analysis to provide support for subsequent knowledge extraction and video indexing. If recognition errors or missed recognition occur, users can manually intervene to mark and correct the content of the main catalog page.

四、自动识别标题页：4. Automatically identify the title page:

系统进一步识别含有一级、二级、三级标题的图像画面，自动定位并框选出标题区域。The system further identifies image images containing first-level, second-level, and third-level titles, and automatically locates and frames the title area.

标题页的识别一般通过文本分析或样式匹配方式识别，通常标题页会包含整个演示文稿的主题或标题，这些文字往往具有较大的字体大小和醒目的位置。通过OCR技术提取关键帧中的文本内容，可以分析文本的字体大小、位置和关键词，以确定是否为标题页。The title page is generally identified through text analysis or style matching. Usually, the title page contains the theme or title of the entire presentation, and these words often have a large font size and a prominent position. By extracting the text content in the key frame through OCR technology, the font size, position and keywords of the text can be analyzed to determine whether it is a title page.

标题文字的识别和转写，除了应用上述OCR技术外，还将利用自然语言处理(NLP)中的句法分析和依存关系分析来理解文本结构和语义。In addition to applying the above-mentioned OCR technology, the recognition and transcription of title text will also utilize syntactic analysis and dependency analysis in natural language processing (NLP) to understand the text structure and semantics.

在此基础上，系统根据标题内容提取关键知识点，这可能涉及归纳总结和简化段落内容。如果存在专业知识点词库，系统可以直接对照该词库提取关键词；若无现成词库，则需借助大语言模型LLM进行知识点的归纳和凝练。On this basis, the system extracts key knowledge points based on the title content, which may involve summarizing and simplifying the paragraph content. If there is a professional knowledge point vocabulary, the system can directly refer to the vocabulary to extract keywords; if there is no ready-made vocabulary, it is necessary to use the large language model LLM to summarize and condense the knowledge points.

五、识别非目录页，即内容页：5. Identify non-directory pages, i.e. content pages:

除去目录页和标题页，其它页面则认定为内容页。Excluding the table of contents and title page, other pages are considered content pages.

对于非目录页的内容，系统将计算其与目录页的关联度，以确定其在视频知识结构中的位置。For non-catalog page content, the system will calculate its relevance to the catalog page to determine its position in the video knowledge structure.

此外，系统还会对内容页进行去重处理，剔除重复或非常相似的帧，只保留最具代表性和相关性的帧内容。In addition, the system will deduplicate content pages, remove duplicate or very similar frames, and retain only the most representative and relevant frame content.

六、人工辅助核对信息：6. Manual assisted information verification:

尽管自动化技术大大提高了处理效率，但人工核对仍然不可或缺。用户需要复查系统自动识别的结果，一旦发现误识别或缺失的信息，应及时进行手动修正和调整，确保每一张标题页及其内容都准确无误。Although automation technology has greatly improved processing efficiency, manual verification is still indispensable. Users need to review the results of the system's automatic recognition. Once misidentification or missing information is found, manual corrections and adjustments should be made in a timely manner to ensure that every title page and its content are accurate.

七、生成视频篇章知识树内容：7. Generate video chapter knowledge tree content:

经过以上步骤，系统最终将所有关键帧的知识元素合并为一棵包含单一根节点的知识树，直观地展现出整个视频的知识结构，方便用户快速理解和导航。After the above steps, the system finally merges the knowledge elements of all key frames into a knowledge tree with a single root node, which intuitively displays the knowledge structure of the entire video and facilitates users to quickly understand and navigate.

八、同时生成带标注时间轴信息的知识点内容：8. Generate knowledge points with annotated timeline information at the same time:

系统不仅提供知识点的详细描述，还将每一知识点与对应的时间戳相结合，形成时间轴上的标注。这使得用户可以便捷地跳转到视频中的具体片段，提高了学习的针对性和效率。The system not only provides a detailed description of the knowledge point, but also combines each knowledge point with the corresponding timestamp to form a timeline annotation. This allows users to easily jump to a specific segment in the video, improving the pertinence and efficiency of learning.

为了能够实施上面的整体方案，下面技术方案中提供一种针对视频教学内容的分析系统的架构。In order to implement the above overall solution, the following technical solution provides an architecture of an analysis system for video teaching content.

视频获取单元：Video acquisition unit:

上传本地视频：用户可以通过界面上传带PPT录屏的视频文件，支持多种常见视频格式，如avi、mp4、flv等。这为用户提供了灵活性，可以方便地将不同格式的视频文件上传到系统中。Upload local video: Users can upload video files with PPT screen recording through the interface, supporting multiple common video formats such as avi, mp4, flv, etc. This provides users with flexibility and allows them to easily upload video files of different formats to the system.

数据同步获取视频：系统设置了定时同步机制，能够自动将符合条件的视频传输到视频获取单元中。这一机制确保了系统中的视频资源始终与原始数据保持同步，无需手动干预。Data synchronization video acquisition: The system has set up a timed synchronization mechanism, which can automatically transfer qualified videos to the video acquisition unit. This mechanism ensures that the video resources in the system are always synchronized with the original data without manual intervention.

准入筛选：目前上传的视频尚未实现自动化的准入筛选机制，因此需要人工进行准入筛选。为了加快准入流程，后续可以考虑增加一种对PPT录屏类视频的自动判断机制，以减少人工操作。Access screening: Currently, there is no automated access screening mechanism for uploaded videos, so manual access screening is required. In order to speed up the access process, we can consider adding an automatic judgment mechanism for PPT screen recording videos to reduce manual operations.

视频时长判断：系统会自动判断视频时长，如果视频过短(小于5分钟)，则提示上传视频失败，并不进行视频存储。这是为了确保系统中存储的视频具有一定的内容和质量。Video length judgment: The system will automatically judge the video length. If the video is too short (less than 5 minutes), it will prompt that the video upload failed and will not be stored. This is to ensure that the videos stored in the system have certain content and quality.

视频存储单元：Video storage unit:

已上传的视频资源会自动存储在视频存储单元中，用户可以直观地查看到已分析和未分析完成的视频结果。同时，用户还可以对视频进行增删改查的操作，以满足不同的管理需求。The uploaded video resources will be automatically stored in the video storage unit, and users can intuitively view the results of analyzed and unanalyzed videos. At the same time, users can also add, delete, modify and query videos to meet different management needs.

标注状态：系统会显示视频的标注状态，包括AI分析中和分析完成两种状态。这有助于用户了解视频的处理进度，并及时采取相应的操作。Annotation status: The system will display the annotation status of the video, including AI analysis in progress and analysis completed. This helps users understand the progress of video processing and take appropriate actions in a timely manner.

发布状态：系统会显示视频的发布状态，包括已发布和未发布两种情况。这可以帮助用户了解视频是否已经准备好供他人观看或使用。Publishing status: The system will display the publishing status of the video, including published or unpublished. This can help users understand whether the video is ready for others to watch or use.

更新时间：系统会记录最新的编辑时间，以便用户了解视频的最新更新情况。Update time: The system will record the latest editing time so that users can understand the latest updates of the video.

编辑功能：用户可以通过点击内容标注单元，对单个视频进行编辑。这为用户提供了更多的控制权，可以根据需求对视频进行修改和调整。Editing function: Users can edit individual videos by clicking on the content annotation unit. This provides users with more control and allows them to modify and adjust the video as needed.

预览结果：系统提供了预览结果的功能，用户可以在编辑完成后预览视频的效果。这有助于用户确认编辑结果是否符合预期，并进行必要的调整。Preview results: The system provides a preview function, and users can preview the effect of the video after editing. This helps users confirm whether the editing results meet expectations and make necessary adjustments.

内容标注单元：Content annotation unit:

(1)目录页识别：利用OCR技术，系统从视频帧中自动识别出目录页。然后，对目录页中的文本内容执行OCR识别和关键词梳理，为后续的知识抽取和视频索引提供支持。若出现识别错误或漏识别的情况，用户可手动介入，标注并修正主目录页的内容。(1) Catalog page recognition: Using OCR technology, the system automatically recognizes the catalog page from the video frame. Then, OCR recognition and keyword combing are performed on the text content in the catalog page to provide support for subsequent knowledge extraction and video indexing. If there are recognition errors or missed recognitions, users can manually intervene to mark and correct the content of the main catalog page.

(2)标题页识别：(2) Title page identification:

系统进一步识别含有一级、二级、三级标题的图像画面，自动定位并框选出标题区域。接着，对标题文字进行识别和转写，使其成为可编辑的文本。The system further identifies images containing first-level, second-level, and third-level titles, automatically locates and frames the title area, and then recognizes and transcribes the title text to make it editable text.

在此基础上，系统根据标题内容提取关键知识点，这可能涉及归纳总结和简化段落内容。On this basis, the system extracts key knowledge points based on the title content, which may involve summarizing and simplifying paragraph content.

如果存在专业知识点词库作为输入，系统可以直接对照该词库提取关键词；若无现成词库，则需借助大语言模型LLM如GPT、文心大模型等语义理解后进行知识点的归纳和凝练。If there is a vocabulary of professional knowledge points as input, the system can directly extract keywords by referring to the vocabulary; if there is no ready-made vocabulary, it is necessary to use large language models LLM such as GPT, Wenxin model, etc. to summarize and condense knowledge points after semantic understanding.

(3)内容页识别：对于非目录页的内容，系统计算其与目录页的关联度，以确定其在视频知识结构中的位置。同时，对内容页进行去重处理，剔除重复或非常相似的帧，只保留最具代表性和相关性的帧内容。(3) Content page identification: For non-catalog page content, the system calculates its relevance to the catalog page to determine its position in the video knowledge structure. At the same time, the content page is deduplicated, and duplicate or very similar frames are removed, retaining only the most representative and relevant frame content.

以上识别的所有页面均带起始时间点和结束时间点，重复的页面在剔除页面时需要将多张页面出现的时间段进行合并。All the pages identified above have a start time point and an end time point. When removing duplicate pages, the time periods in which multiple pages appear need to be merged.

内容修正单元：Content Modification Unit:

内容修正负责对已识别和标注的内容进行修正和优化的部分。Content modification is responsible for modifying and optimizing the identified and marked content.

(1)新增关键帧图片：(1) Add key frame images:

用户可以通过在视频播放过程中截图的方式，新增关键帧图片。这一功能使得用户能够补充系统在自动识别过程中可能遗漏的关键帧，从而确保视频内容的完整性。Users can add keyframe images by taking screenshots during video playback. This function allows users to supplement keyframes that may be missed by the system during automatic recognition, thereby ensuring the integrity of the video content.

通过截图工具或接口，选择视频中的特定画面作为新的关键帧，并将其添加到系统中。Through the screenshot tool or interface, select specific frames in the video as new key frames and add them to the system.

(2)删除关键帧图片：(2) Delete key frame images:

系统有进行重复关键帧图片的剔除，但仍不可避免会存在高度相似的内容。如果系统在自动识别过程中产生了重复的或高度相似的关键帧图片，用户可以通过删除功能去除这些冗余的内容。The system removes duplicate keyframe images, but it is inevitable that there will be highly similar content. If the system generates duplicate or highly similar keyframe images during the automatic recognition process, users can use the delete function to remove these redundant content.

(3)调整关键帧图片顺序：(3) Adjust the order of key frame images:

在某些情况下(如讲师讲解顺序)，系统可能会错误地排序关键帧图片。用户可以利用调整顺序的功能，手动更正关键帧的排列顺序，以确保视频内容的连贯性和逻辑性。In some cases (such as the order in which the lecturer is explaining), the system may incorrectly sort the keyframe images. Users can use the function of adjusting the order to manually correct the order of the keyframes to ensure the coherence and logic of the video content.

通过拖放界面或排序工具，用户可以轻松地改变关键帧的顺序，使其与实际视频内容相匹配。Through the drag-and-drop interface or sequencing tools, users can easily change the order of keyframes to match the actual video content.

(4)修正标题内容识别错误：(4) Correct title content recognition errors:

在标题页识别过程中，可能会出现文字识别错误或转写不准确的情况。内容修正单元提供了修正标题内容的功能，用户可以手动编辑和修改错误的标题文字。During the title page recognition process, text recognition errors or inaccurate transcription may occur. The content correction unit provides a function to correct the title content, and users can manually edit and modify incorrect title text.

通过提供文本编辑工具，用户可以直接在系统中更正标题内容，确保标题的准确性和一致性。By providing text editing tools, users can correct title content directly in the system to ensure the accuracy and consistency of the title.

结果预览单元：Result preview unit:

预览结构化视频篇章结构：可以看到单个视频的完整视频篇章结构效果；Preview structured video chapter structure: you can see the complete video chapter structure effect of a single video;

预览视频摘要：根据篇章结构内容生成视频摘要内容；Preview video summary: Generate video summary content based on the chapter structure content;

预览知识点定位效果：根据前面的关键帧和篇章结果带时间信息，输出带时间戳的知识点内容列表。Preview the knowledge point positioning effect: output a knowledge point content list with timestamps based on the previous key frames and chapter results with time information.

下面针对结果预览单元进行详细的说明。The result preview unit is described in detail below.

预览结构化视频篇章结构：这一功能使用户能够直观地看到单个视频的完整视频篇章结构效果。篇章结构通常是指视频内容的组织和布局，包括各个部分或章节的划分。通过预览篇章结构，用户可以快速理解视频的整体框架和逻辑流程，这有助于后续对视频内容的深入分析和处理。这种结构化预览特别适用于长视频或复杂内容的视频，因为它可以帮助用户更快地把握视频的主旨和要点。Preview structured video chapter structure: This feature allows users to intuitively see the complete video chapter structure effect of a single video. Chapter structure generally refers to the organization and layout of video content, including the division of various parts or chapters. By previewing the chapter structure, users can quickly understand the overall framework and logical flow of the video, which helps with subsequent in-depth analysis and processing of the video content. This structured preview is particularly suitable for long videos or videos with complex content, as it can help users grasp the main idea and key points of the video more quickly.

预览视频摘要：根据视频的篇章结构内容，系统可以自动生成视频摘要。这个摘要是对视频主要内容的精简和提炼，旨在帮助用户快速了解视频的核心信息。通过预览视频摘要，用户可以迅速判断该视频是否符合自己的兴趣或需求，而无需观看整个视频。在信息爆炸的时代，这种摘要预览功能对于筛选和过滤大量视频内容具有重要意义。Preview video summary: Based on the chapter structure of the video, the system can automatically generate a video summary. This summary is a simplification and refinement of the main content of the video, aiming to help users quickly understand the core information of the video. By previewing the video summary, users can quickly determine whether the video meets their interests or needs without having to watch the entire video. In the era of information explosion, this summary preview function is of great significance for screening and filtering a large amount of video content.

预览知识点定位效果：这一功能利用前面的关键帧和篇章结构分析结果，输出带有时间戳的知识点内容列表。用户通过预览这个列表，可以快速定位到视频中的关键信息或重要知识点，大大提高了浏览和学习效率。对于教育、培训或研究类视频来说，这种知识点定位功能尤为重要，因为它能帮助用户直达核心内容，节省时间和精力。Preview knowledge point positioning effect: This function uses the previous key frame and chapter structure analysis results to output a knowledge point content list with timestamps. By previewing this list, users can quickly locate key information or important knowledge points in the video, greatly improving browsing and learning efficiency. For education, training or research videos, this knowledge point positioning function is particularly important because it can help users get to the core content directly, saving time and energy.

因此，“结果预览单元”中的这些预览功能在视频处理和分析中扮演着重要角色。它们不仅提升了用户浏览视频信息的效率，还为用户提供了更加便捷和高效的方式来掌握和理解视频内容。在快速浏览视频信息、掌握视频内容方面，这些功能无疑具有较大的意义。此外，结合大模型技术生成的文章摘要等，可以进一步丰富用户对视频内容的理解和应用。例如，用户可以根据摘要快速判断视频的主题和价值，也可以根据知识点定位快速找到感兴趣或需要深入研究的内容部分。这些功能对于需要从大量视频中快速获取信息的用户来说，具有极高的实用价值。Therefore, these preview functions in the "result preview unit" play an important role in video processing and analysis. They not only improve the efficiency of users browsing video information, but also provide users with a more convenient and efficient way to grasp and understand video content. These functions are undoubtedly of great significance in quickly browsing video information and grasping video content. In addition, combined with the article abstracts generated by large model technology, users' understanding and application of video content can be further enriched. For example, users can quickly judge the theme and value of the video based on the abstract, or quickly find the content part that they are interested in or need to be studied in depth based on the knowledge point positioning. These functions are of great practical value for users who need to quickly obtain information from a large number of videos.

而针对视频时间戳定位，为实现视频内容的知识点切割提供了有力的技术支持。下面，我们将详细阐述视频时间戳定位在知识点切割方面的作用和意义。The positioning of video timestamps provides a powerful technical support for the knowledge point cutting of video content. Below, we will elaborate on the role and significance of video timestamp positioning in knowledge point cutting.

1.精准定位与快速检索1. Accurate positioning and fast retrieval

通过时间戳定位，我们可以精确地标记视频中每一个关键知识点的开始和结束时间。这对于学习者和教育者来说都极为有用。学习者可以直接跳转到他们感兴趣或需要复习的知识点，而教育者则可以快速找到并展示教学中的重点内容。Through timestamp positioning, we can accurately mark the start and end time of each key knowledge point in the video. This is extremely useful for both learners and educators. Learners can jump directly to the knowledge points they are interested in or need to review, while educators can quickly find and display the key points in the teaching.

2.个性化学习体验2. Personalized learning experience

每个人的学习需求和节奏都是不同的。有了时间戳定位，学习者可以根据自己的需要，选择性地观看和学习视频中的特定部分，从而实现更加个性化的学习体验。Everyone's learning needs and pace are different. With timestamp positioning, learners can selectively watch and learn specific parts of the video according to their needs, thus achieving a more personalized learning experience.

3.提高学习效率3. Improve learning efficiency

在没有时间戳定位的情况下，学习者可能需要花费大量时间来寻找和定位视频中的关键信息。而有了精确的时间戳，学习者可以迅速找到并专注于视频中的核心内容，从而大大提高学习效率。Without timestamp positioning, learners may need to spend a lot of time to find and locate key information in the video. With accurate timestamps, learners can quickly find and focus on the core content of the video, greatly improving learning efficiency.

4.丰富的教学资源切割与重组4. Rich teaching resources cutting and reorganization

教育者可以利用时间戳定位技术，将视频内容切割成多个独立的知识点片段。这些片段不仅可以作为单独的教学资源使用，还可以根据需要进行重新组合，创建出更加灵活和多样的教学内容。Educators can use timestamp positioning technology to cut video content into multiple independent knowledge point segments. These segments can not only be used as separate teaching resources, but can also be recombined as needed to create more flexible and diverse teaching content.

5.促进视频内容的再利用和创新5. Promote the reuse and innovation of video content

精确的时间戳定位使得视频内容的再利用变得更加容易。教育者或内容创作者可以轻松地提取视频中的特定部分，进行二次创作或整合，从而生成全新的教学内容或产品。Precise timestamp positioning makes it easier to reuse video content. Educators or content creators can easily extract specific parts of the video for secondary creation or integration to generate new teaching content or products.

6.提升互动性和参与度6. Improve interactivity and engagement

通过时间戳定位，教育者可以在线上教学平台中嵌入互动元素，如提问、讨论或测验，这些都可以与视频的特定时间点相关联。这种互动性不仅可以提高学习者的参与度，还能帮助教育者更好地了解学习者的掌握情况。Through timestamp positioning, educators can embed interactive elements in online teaching platforms, such as questions, discussions, or quizzes, which can be associated with specific time points in the video. This interactivity can not only increase learner engagement, but also help educators better understand learners' mastery.

7.拓宽视频教学的应用场景7. Expand the application scenarios of video teaching

随着在线教育的兴起，视频时间戳定位技术使得视频教学不再局限于传统的课堂教学环境。无论是企业培训、远程教育还是自学，精确的时间戳定位都能为学习者提供更加灵活和高效的学习体验。With the rise of online education, video timestamp positioning technology has made video teaching no longer limited to the traditional classroom teaching environment. Whether it is corporate training, distance education or self-study, accurate timestamp positioning can provide learners with a more flexible and efficient learning experience.

因此，视频时间戳定位在知识点切割方面发挥着不可或缺的作用。它不仅提高了学习的效率和灵活性，还为教育者和学习者提供了更加丰富和多样的教学资源和学习方式。随着技术的不断发展，我们有理由相信，视频时间戳定位将在未来教育领域中发挥更加重要的作用。Therefore, video timestamp positioning plays an indispensable role in knowledge point segmentation. It not only improves the efficiency and flexibility of learning, but also provides educators and learners with richer and more diverse teaching resources and learning methods. With the continuous development of technology, we have reason to believe that video timestamp positioning will play a more important role in the future education field.

保存内容结果单元：Save content result unit:

已经确认无误的篇章结果内容，可以进行保存结果操作，保存后即完成一个视频的生产流程。单个生产完成后即可进入下一个视频生产流程。After the chapter results have been confirmed to be correct, you can save the results. After saving, the production process of a video is completed. After a single production is completed, you can enter the next video production process.

此外，本发明实施例还可以对音频信号的分析，从而解决可能出现的声画不一致问题。通过对音视频内容的深度融合分析，系统能够更为精准地识别和标注视频中的知识点和重要信息，为用户提供更加完善的学习辅助工具。In addition, the embodiment of the present invention can also analyze the audio signal to solve the problem of inconsistent sound and picture. Through the deep integration analysis of audio and video content, the system can more accurately identify and mark the knowledge points and important information in the video, providing users with a more complete learning aid tool.

如此设计，存在以下有益效果，提升学习效率：通过快速定位和摘要分析教学视频的关键知识点，本专利可有效提高学习者的学习效率。也可以通过构建多个视频的知识图谱，提高学习迁移能力和个性化推荐，最终提升学习效率。智能检索与浏览：支持大规模视频数据的智能检索，用户能够迅速查找或定位到目标视频片段，节省浏览时间。课程生产工具优化：为视频内容生产者提供了高效的编辑工具，从最终用户需求出发反向优化生产过程，提高制作效率。知识点自动分解：自动化地将视频内容分解成各个知识点片段，便于后续的个性化学习和内容重组。快速定位与导航：生成的知识树和带标注的时间轴使用户能够快速在视频中导航，找到所需的精确片段。视频资源效能化：培训平台课程视频的快速总结归纳，资源效用最大化。Such a design has the following beneficial effects, improving learning efficiency: By quickly locating and summarizing the key knowledge points of teaching videos, this patent can effectively improve learners' learning efficiency. It is also possible to improve learning transfer capabilities and personalized recommendations by constructing knowledge graphs of multiple videos, ultimately improving learning efficiency. Intelligent retrieval and browsing: Supporting intelligent retrieval of large-scale video data, users can quickly find or locate target video clips, saving browsing time. Course production tool optimization: Provides efficient editing tools for video content producers, reversely optimizes the production process based on end-user needs, and improves production efficiency. Automatic decomposition of knowledge points: Automatically decompose video content into various knowledge point fragments to facilitate subsequent personalized learning and content reorganization. Quick positioning and navigation: The generated knowledge tree and annotated timeline enable users to quickly navigate in the video and find the precise clips they need. Effectiveness of video resources: Quick summary and induction of training platform course videos to maximize resource utility.

请结合参阅图2，图2为本发明实施例提供的一种针对视频教学内容的分析装置110，包括：Please refer to FIG. 2 , which is a device 110 for analyzing video teaching content provided by an embodiment of the present invention, including:

获取模块1101，用于获取待分析教学视频；An acquisition module 1101 is used to acquire a teaching video to be analyzed;

分析模块1102，用于对所述待分析教学视频进行抽帧处理，得到多个候选教学视频帧，所述候选教学视频帧配置有时间戳；分别从所述多个候选教学视频帧中，确定出教学目录页、教学标题页以及教学内容页；根据所述教学目录页、教学标题页以及教学内容页，构建教学知识树作为所述待分析教学视频的分析结果。The analysis module 1102 is used to extract frames from the teaching video to be analyzed to obtain multiple candidate teaching video frames, each of which is configured with a timestamp; determine the teaching catalog page, teaching title page and teaching content page from the multiple candidate teaching video frames respectively; and construct a teaching knowledge tree as the analysis result of the teaching video to be analyzed based on the teaching catalog page, teaching title page and teaching content page.

需要说明的是，前述针对视频教学内容的分析装置110的实现原理可以参考前述针对视频教学内容的分析方法的实现原理，在此不再赘述。应理解以上装置的各个模块的划分仅仅是一种逻辑功能的划分，实际实现时可以全部或部分集成到一个物理实体上，也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现；也可以全部以硬件的形式实现；还可以部分模块通过处理元件调用软件的形式实现，部分模块通过硬件的形式实现。例如，针对视频教学内容的分析装置110可以为单独设立的处理元件，也可以集成在上述装置的某一个芯片中实现，此外，也可以以程序代码的形式存储于上述装置的存储器中，由上述装置的某一个处理元件调用并执行以上针对视频教学内容的分析装置110的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起，也可以独立实现。这里所描述的处理元件可以是一种集成电路，具有信号的处理能力。在实现过程中，上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。It should be noted that the implementation principle of the aforementioned analysis device 110 for video teaching content can refer to the implementation principle of the aforementioned analysis method for video teaching content, which will not be repeated here. It should be understood that the division of the various modules of the above device is only a division of logical functions. In actual implementation, they can be fully or partially integrated into a physical entity, or physically separated. And these modules can all be implemented in the form of software calling through processing elements; they can also be all implemented in the form of hardware; some modules can also be implemented in the form of software calling through processing elements, and some modules can be implemented in the form of hardware. For example, the analysis device 110 for video teaching content can be a separately established processing element, or it can be integrated in a chip of the above device. In addition, it can also be stored in the memory of the above device in the form of program code, and called and executed by a processing element of the above device. The function of the above analysis device 110 for video teaching content. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together, or they can be implemented independently. The processing element described here can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each module above can be completed by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

例如，以上这些模块可以是被配置成实施以上方法的一个或多个集成电路，例如：一个或多个特定集成电路(application specific integrated circuit，ASIC)，或，一个或多个微处理器(digital signal processor，DSP)，或，一个或者多个现场可编程门阵列(fieldprogrammable gate array，FPGA)等。再如，当以上某个模块通过处理元件调度程序代码的形式实现时，该处理元件可以是通用处理器，例如中央处理器(centralprocessingunit，CPU)或其它可以调用程序代码的处理器。再如，这些模块可以集成在一起，以片上系统(system-on-a-chip，SOC)的形式实现。For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASIC), or one or more microprocessors (DSP), or one or more field programmable gate arrays (FPGA). For another example, when a module above is implemented in the form of a processing element scheduling program code, the processing element may be a general-purpose processor, such as a central processing unit (CPU) or other processor that can call program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

本发明实施例提供一种计算机设备100，计算机设备100包括处理器及存储有计算机指令的非易失性存储器，计算机指令被处理器执行时，计算机设备100执行前述的针对视频教学内容的分析装置110。如图3所示，图3为本发明实施例提供的计算机设备100的结构框图。计算机设备100包括针对视频教学内容的分析装置110、存储器111、处理器112及通信单元113。The embodiment of the present invention provides a computer device 100, which includes a processor and a non-volatile memory storing computer instructions. When the computer instructions are executed by the processor, the computer device 100 executes the aforementioned analysis device 110 for video teaching content. As shown in FIG3, FIG3 is a structural block diagram of the computer device 100 provided by the embodiment of the present invention. The computer device 100 includes an analysis device 110 for video teaching content, a memory 111, a processor 112, and a communication unit 113.

为实现数据的传输或交互，存储器111、处理器112以及通信单元113各元件相互之间直接或间接地电性连接。例如，可通过一条或多条通讯总线或信号线实现这些元件相互之间电性连接。针对视频教学内容的分析装置110包括至少一个可以软件或固件(firmware)的形式存储于存储器111中或固化在计算机设备100的操作系统(operatingsystem，OS)中的软件功能模块。处理器112用于执行存储器111中存储的针对视频教学内容的分析装置110，例如针对视频教学内容的分析装置110所包括的软件功能模块及计算机程序等。To achieve data transmission or interaction, the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other directly or indirectly. For example, these components can be electrically connected to each other through one or more communication buses or signal lines. The analysis device 110 for video teaching content includes at least one software function module that can be stored in the memory 111 in the form of software or firmware or solidified in the operating system (OS) of the computer device 100. The processor 112 is used to execute the analysis device 110 for video teaching content stored in the memory 111, such as the software function module and computer program included in the analysis device 110 for video teaching content.

本发明实施例提供一种可读存储介质，可读存储介质包括计算机程序，计算机程序运行时控制可读存储介质所在计算机设备执行前述的针对视频教学内容的分析方法。An embodiment of the present invention provides a readable storage medium, which includes a computer program. When the computer program is running, it controls the computer device where the readable storage medium is located to execute the aforementioned analysis method for video teaching content.

出于说明目的，前面的描述是参考具体实施例而进行的。但是，上述说明性论述并不打算穷举或将本公开局限于所公开的精确形式。根据上述教导，众多修改和变化都是可行的。选择并描述这些实施例是为了最佳地说明本公开的原理及其实际应用，从而使本领域技术人员最佳地利用本公开，并利用具有不同修改的各种实施例以适于预期的特定应用。For illustrative purposes, the foregoing description is made with reference to specific embodiments. However, the above illustrative discussion is not intended to be exhaustive or to limit the present disclosure to the precise form disclosed. Numerous modifications and variations are possible in accordance with the above teachings. These embodiments are selected and described in order to best illustrate the principles of the present disclosure and its practical application, so that those skilled in the art can best utilize the present disclosure and utilize various embodiments with different modifications to suit the intended specific application.

Claims

1. An analysis method for video teaching content, comprising:

Acquiring a teaching video to be analyzed;

Performing frame extraction processing on the teaching video to be analyzed to obtain a plurality of candidate teaching video frames, wherein the candidate teaching video frames are configured with time stamps;

determining a teaching catalog page, a teaching title page and a teaching content page from the plurality of candidate teaching video frames respectively;

And constructing a teaching knowledge tree as an analysis result of the teaching video to be analyzed according to the teaching catalog page, the teaching title page and the teaching content page.

2. The method of claim 1, wherein the frame extraction process is performed on the teaching video to be analyzed to obtain a plurality of candidate teaching video frames, including:

Performing frame extraction processing on the teaching video to be analyzed according to a preset time interval to obtain a plurality of initial teaching video frames;

and performing de-duplication processing on the plurality of initial teaching video frames to obtain the plurality of candidate teaching video frames.

3. The method of claim 2, wherein performing the deduplication process on the plurality of initial teaching video frames to obtain the plurality of candidate teaching video frames comprises:

calculating the structural similarity index of every two adjacent initial teaching video frames in the plurality of initial teaching video frames;

If the structural similarity index is larger than the preset similarity index threshold, deleting the initial teaching video frames positioned in the subsequent step, and recalculating the structural similarity indexes of every two adjacent initial teaching video frames in the rest initial teaching video frames until the duplication removal process is finished;

If the structural similarity index is not greater than the preset similarity index threshold, ending the de-duplication process;

and taking the initial teaching video frames after the multiple repetition as the candidate teaching video frames.

4. The method of claim 1, wherein determining the instructional directory page from the plurality of candidate instructional video frames comprises:

Extracting text content of a target candidate teaching video frame, wherein the target candidate teaching video frame is any one candidate teaching video frame in the plurality of candidate teaching video frames;

analyzing the text content to obtain keywords included in the text content;

and under the condition that the keywords comprise preset catalog keywords, taking the content corresponding to the target candidate teaching video as the teaching catalog page.

5. The method of claim 1, wherein determining the tutorial title page from the plurality of candidate tutorial video frames comprises:

Acquiring text font information of a target candidate teaching video frame, wherein the target candidate teaching video frame is any one candidate teaching video frame in the plurality of candidate teaching video frames, and the text font information comprises font size, font position and keyword information;

And if the title content is determined from the text font information according to the font size, the font position and the keyword information, taking the content corresponding to the target candidate teaching video as the teaching title page.

6. The method of claim 1, wherein determining the tutorial page from the plurality of candidate tutorial video frames comprises:

And taking the contents except the teaching catalog page and the teaching title page as the teaching content page.

7. The method according to claim 1, wherein constructing a teaching knowledge tree as the analysis result of the teaching video to be analyzed according to the teaching catalog page, the teaching title page and the teaching content page comprises:

acquiring first keywords of all the teaching catalog pages, and taking the first keywords as index bases of the teaching knowledge tree;

acquiring a second keyword of each teaching title page, and determining a key knowledge point corresponding to each teaching title page according to the second keyword;

Determining the association relation among each teaching content page, each teaching catalog page and each teaching title page, and determining the position relation among the teaching catalog pages, the teaching title pages and the teaching content pages in the constructed teaching knowledge tree;

and constructing the teaching knowledge tree according to the index basis, the key knowledge points, the position relation and the time stamp, wherein the key knowledge points and the time stamp have binding relation.

8. The method of claim 7, wherein the method further comprises:

responding to the video frame modification operation, and performing at least one operation of adding, deleting and adjusting the sequence on the plurality of candidate teaching video frames;

responding to the text adjustment operation, and modifying text contents included in the teaching catalog page, the teaching title page and the teaching content page;

And responding to the knowledge tree preservation operation, preserving the teaching knowledge tree and binding the teaching knowledge tree with the video identification of the teaching video to be analyzed.

9. The method of claim 1, wherein the obtaining the teaching video to be analyzed comprises:

setting a synchronization mechanism with a preset teaching video database;

When the fact that the preset teaching video database stores a new teaching video to be processed is detected, judging whether the teaching video to be processed is a recorded frequency demonstration video or not and whether the video duration exceeds a preset minimum duration or not;

If yes, the teaching video to be processed is used as the teaching video to be analyzed;

if not, deleting the teaching video to be processed from the preset teaching video database.

10. An analysis device for video teaching content, comprising:

The acquisition module is used for acquiring the teaching video to be analyzed;

the analysis module is used for performing frame extraction processing on the teaching video to be analyzed to obtain a plurality of candidate teaching video frames, and the candidate teaching video frames are configured with time stamps; determining a teaching catalog page, a teaching title page and a teaching content page from the plurality of candidate teaching video frames respectively; and constructing a teaching knowledge tree as an analysis result of the teaching video to be analyzed according to the teaching catalog page, the teaching title page and the teaching content page.