CN118520882A - Medical long text question answering method and device, electronic equipment and storage medium - Google Patents
Medical long text question answering method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN118520882A CN118520882A CN202410978469.6A CN202410978469A CN118520882A CN 118520882 A CN118520882 A CN 118520882A CN 202410978469 A CN202410978469 A CN 202410978469A CN 118520882 A CN118520882 A CN 118520882A
- Authority
- CN
- China
- Prior art keywords
- data
- question
- text
- medical
- language model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Machine Translation (AREA)
Abstract
Description
技术领域Technical Field
本申请涉及互联网技术领域,尤其涉及一种医学长文本问答方法、装置、电子设备及存储介质。The present application relates to the field of Internet technology, and in particular to a method, device, electronic device and storage medium for answering long medical texts.
背景技术Background Art
大语言模型(Large Language Model,LLM)通常在预定义大小窗口的上下文数据中进行训练,当上下文内容长度超过预训练长度时,大语言模型性能通常会出现大幅度的下滑。Large language models (LLMs) are usually trained on context data in a predefined window size. When the length of the context content exceeds the pre-training length, the performance of the large language model usually declines significantly.
然而,在医学领域,数据通常为长文本形式,如病例报告、诊断诊疗方案等,并且实际应用中,通常需要对其进行文档摘要、问答等。文本有限的上下文长度支持,使大语言模型在医疗领域缺少适配的注意力结果,因此如何获取大语言模型的注意力结果,是一个亟需解决的问题。However, in the medical field, data is usually in the form of long text, such as case reports, diagnosis and treatment plans, etc., and in practical applications, it is usually necessary to perform document summaries, question and answer, etc. The limited context length support of text makes large language models lack adaptive attention results in the medical field. Therefore, how to obtain the attention results of large language models is an urgent problem to be solved.
发明内容Summary of the invention
本申请实施例提供一种医学长文本问答方法、装置、电子设备及存储介质,以解决如何获取大语言模型的注意力结果的技术问题。The embodiments of the present application provide a medical long text question answering method, device, electronic device and storage medium to solve the technical problem of how to obtain the attention results of a large language model.
第一方面,本申请实施例提供了一种医学长文本问答方法,应用于电子设备,所述电子设备存储有大语言模型,所述医学长文本问答方法包括:In a first aspect, an embodiment of the present application provides a medical long text question answering method, which is applied to an electronic device, wherein the electronic device stores a large language model, and the medical long text question answering method includes:
获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据;Acquire medical text data, extract question and answer data from the medical text data, and concatenate multiple question and answer data to obtain long text question and answer data with a target length;
在所述长文本问答数据中提取出语义单元,将所述语义单元的第一嵌入数据输入到所述大语言模型的RoPE编码层中,获取所述RoPE编码层的原始基础频率和注意力对数的原始缩放因子,所述第一嵌入数据为携带了所述语义单元的语义信息的嵌入数据;Extracting semantic units from the long text question-answering data, inputting first embedding data of the semantic units into the RoPE coding layer of the large language model, and obtaining the original base frequency and the original scaling factor of the attention logarithm of the RoPE coding layer, wherein the first embedding data is embedding data carrying semantic information of the semantic unit;
根据所述目标长度、所述大语言模型在预训练阶段处理文本的最大长度以及预设的第一扩大系数生成模型,生成第一扩大系数,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率;Generate a first expansion coefficient according to the target length, the maximum length of the text processed by the large language model in the pre-training stage, and a preset first expansion coefficient generation model, and multiply the first expansion coefficient by the original basic frequency to obtain a target basic frequency;
获取所述语义单元的位置索引,根据所述位置索引、所述最大长度以及预设的第二扩大系数生成模型,生成第二扩大系数,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子;Obtaining a position index of the semantic unit, generating a second expansion factor according to the position index, the maximum length, and a preset second expansion factor generation model, and multiplying the second expansion factor by the original scaling factor to obtain a target scaling factor;
在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述第二嵌入数据为携带了所述语义单元的所述语义信息以及相对位置信息的嵌入数据;In the RoPE coding layer, based on the target base frequency and the target scaling factor, the first embedded data of the semantic unit is encoded to obtain second embedded data of the semantic unit, where the second embedded data is embedded data carrying the semantic information and relative position information of the semantic unit;
按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,所述分组策略为当前所述分组包含前一个或后一个所述分组中预设数量的所述语义单元的所述第二嵌入数据;According to a preset grouping strategy, the second embedded data of the plurality of semantic units are divided into a plurality of groups, wherein the grouping strategy is that the current group contains a preset number of the second embedded data of the semantic units in the previous or next group;
将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果。The group is input into the attention layer of the large language model, attention calculation is performed on the current group to obtain the calculation result corresponding to the group, and the calculation results corresponding to the group are merged to obtain the attention result output by the attention layer.
在第一方面的一种可能的实现方式中,所述获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据,包括:In a possible implementation manner of the first aspect, acquiring medical text data, extracting question and answer data from the medical text data, and concatenating a plurality of the question and answer data to obtain long text question and answer data having a text length of a target length includes:
获取所述医学文本数据,在所述医学文本数据中提取每个问句,将所述问句及所述问句对应的答案信息组合成问答对;Acquire the medical text data, extract each question sentence from the medical text data, and combine the question sentence and the answer information corresponding to the question sentence into a question-answer pair;
将多个所述问答对组成问答数据,将多个所述问答数据进行拼接,得到所述文本长度为所述目标长度的所述长文本问答数据。Multiple question-answer pairs are combined into question-answer data, and multiple question-answer data are concatenated to obtain the long text question-answer data whose text length is the target length.
在第一方面的一种可能的实现方式中,所述按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,包括:In a possible implementation manner of the first aspect, dividing the second embedded data of the plurality of semantic units into a plurality of groups according to a preset grouping strategy includes:
将所述长文本问答数据的上下文长度除以所述大语言模型预训练的所述最大长度并向上取整,确定分组数量;Divide the context length of the long text question-answering data by the maximum length of the pre-trained large language model and round up to determine the number of groups;
根据所述分组数量和所述分组策略,将多个所述语义单元的所述第二嵌入数据分成多个所述分组。The second embedded data of a plurality of the semantic units are divided into a plurality of the groups according to the grouping number and the grouping strategy.
在第一方面的一种可能的实现方式中,在所述将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果之后,所述医学长文本问答方法,包括:In a possible implementation of the first aspect, after inputting the group into the attention layer of the large language model, performing attention calculation on the current group to obtain a calculation result corresponding to the group, and merging the calculation results corresponding to the group to obtain an attention result output by the attention layer, the medical long text question answering method includes:
选择全参数训练或者Lora微调的方式,对所述大语言模型进行训练,得到训练后的所述大语言模型。Select full parameter training or Lora fine-tuning to train the large language model to obtain the trained large language model.
在第一方面的一种可能的实现方式中,所述第一扩大系数生成模型为:In a possible implementation manner of the first aspect, the first expansion coefficient generation model is:
; ;
其中,所述x为所述第一扩大系数,所述为所述目标长度,所述c为所述大语言模型在预训练阶段处理文本的所述最大长度。Wherein, x is the first expansion coefficient, is the target length, and c is the maximum length of the text processed by the large language model in the pre-training stage.
在第一方面的一种可能的实现方式中,所述第二扩大系数生成模型为:In a possible implementation manner of the first aspect, the second expansion coefficient generation model is:
; ;
其中,所述y为所述第二扩大系数,所述c为所述大语言模型在预训练阶段处理文本的所述最大长度,所述m为所述语义单元的所述位置索引,所述z为预设常数。Among them, y is the second expansion coefficient, c is the maximum length of the text processed by the large language model in the pre-training stage, m is the position index of the semantic unit, and z is a preset constant.
在第一方面的一种可能的实现方式中,所述预设数量包括分组大小的1/2、分组大小的1/3、分组大小的1/5、分组大小的1/6中的其中一种或其组合。In a possible implementation manner of the first aspect, the preset number includes one of 1/2 of the packet size, 1/3 of the packet size, 1/5 of the packet size, 1/6 of the packet size, or a combination thereof.
第二方面,本申请实施例提供了一种医学长文本问答装置,应用于电子设备,所述电子设备存储有大语言模型,包括:In a second aspect, an embodiment of the present application provides a medical long text question-answering device, which is applied to an electronic device, wherein the electronic device stores a large language model, including:
第一获取模块,用于获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据;A first acquisition module is used to acquire medical text data, extract question and answer data from the medical text data, and concatenate multiple question and answer data to obtain long text question and answer data with a target length;
第二获取模块,用于在所述长文本问答数据中提取出语义单元,将所述语义单元的第一嵌入数据输入到所述大语言模型的RoPE编码层中,获取所述RoPE编码层的原始基础频率和注意力对数的原始缩放因子,所述第一嵌入数据为携带了所述语义单元的语义信息的嵌入数据;A second acquisition module is used to extract semantic units from the long text question-answering data, input first embedding data of the semantic units into the RoPE coding layer of the large language model, and obtain the original base frequency and the original scaling factor of the attention logarithm of the RoPE coding layer, wherein the first embedding data is the embedding data carrying the semantic information of the semantic unit;
第一生成模块,用于根据所述目标长度、所述大语言模型在预训练阶段处理文本的最大长度以及预设的第一扩大系数生成模型,生成第一扩大系数,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率;A first generating module, configured to generate a model according to the target length, a maximum length of text processed by the large language model in a pre-training phase, and a preset first expansion coefficient, generate a first expansion coefficient, and multiply the first expansion coefficient by the original base frequency to obtain a target base frequency;
第二生成模块,用于获取所述语义单元的位置索引,根据所述位置索引、所述最大长度以及预设的第二扩大系数生成模型,生成第二扩大系数,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子;A second generation module is used to obtain a position index of the semantic unit, generate a second expansion factor according to the position index, the maximum length and a preset second expansion factor generation model, and multiply the second expansion factor by the original scaling factor to obtain a target scaling factor;
编码模块,用于在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述第二嵌入数据为携带了所述语义单元的所述语义信息以及相对位置信息的嵌入数据;A coding module, configured to encode, in the RoPE coding layer, the first embedded data of the semantic unit based on the target base frequency and the target scaling factor to obtain second embedded data of the semantic unit, wherein the second embedded data is embedded data carrying the semantic information and relative position information of the semantic unit;
分组模块,用于按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,所述分组策略为当前所述分组包含前一个或后一个所述分组中预设数量的所述语义单元的所述第二嵌入数据;A grouping module, used for dividing the second embedded data of the plurality of semantic units into a plurality of groups according to a preset grouping strategy, wherein the grouping strategy is that the current group contains a preset number of the second embedded data of the semantic units in the previous or next group;
输出模块,用于将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果。The output module is used to input the group into the attention layer of the large language model, perform attention calculation on the current group, obtain the calculation result corresponding to the group, merge the calculation results corresponding to the group, and obtain the attention result output by the attention layer.
第三方面,本申请实施例提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面中的医学长文本问答方法。In a third aspect, an embodiment of the present application provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the medical long text question and answer method in the first aspect when executing the computer program.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面中的医学长文本问答方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the medical long text question and answer method in the above-mentioned first aspect is implemented.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行上述第一方面中所述的医学长文本问答方法。In a fifth aspect, an embodiment of the present application provides a computer program product. When the computer program product is run on an electronic device, the electronic device executes the medical long text question and answer method described in the first aspect above.
本申请实施例有益效果在于两方面,一方面,将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果,因此,电子设备能获取大语言模型的注意力结果,有利于提高注意力结果的获取效率;另一方面,通过所述注意力结果,可以帮助所述大语言模型理解所述长文本问答数据中不同所述语义单元的相对位置和关系,有利于提高所述大语言模型在所述长文本问答数据上的推理效果。The beneficial effects of the embodiments of the present application lie in two aspects. On the one hand, the grouping is input into the attention layer of the large language model, and attention calculation is performed on the current group to obtain the calculation result corresponding to the group, and the calculation results corresponding to the group are merged to obtain the attention result output by the attention layer. Therefore, the electronic device can obtain the attention result of the large language model, which is beneficial to improving the efficiency of obtaining the attention result. On the other hand, the attention result can help the large language model understand the relative positions and relationships of the different semantic units in the long text question and answer data, which is beneficial to improving the reasoning effect of the large language model on the long text question and answer data.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例提供的医学长文本问答方法的应用场景图;FIG1 is an application scenario diagram of the medical long text question answering method provided in an embodiment of the present application;
图2是本申请实施例提供的医学长文本问答方法的流程示意图;FIG2 is a schematic diagram of a flow chart of a medical long text question answering method provided in an embodiment of the present application;
图3为本申请实施例提供的获取长文本问答数据的流程图;FIG3 is a flow chart of obtaining long text question and answer data provided by an embodiment of the present application;
图4为本申请实施例提供的医学长文本问答装置的示意性框图;FIG4 is a schematic block diagram of a medical long text question-answering device provided in an embodiment of the present application;
图5为本申请实施例提供的电子设备的结构示意图。FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
具体实施方式DETAILED DESCRIPTION
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not intended to limit the present application. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in the field without making creative work are within the scope of protection of the present application.
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the present specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or combinations thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term “and/or” used in the specification and appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the specification and appended claims of this application, the term "if" can be interpreted as "when" or "uponce" or "in response to determining" or "in response to detecting", depending on the context. Similarly, the phrase "if it is determined" or "if [described condition or event] is detected" can be interpreted as meaning "uponce it is determined" or "in response to determining" or "uponce [described condition or event] is detected" or "in response to detecting [described condition or event]", depending on the context.
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the present application specification and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the descriptions and cannot be understood as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。References to "one embodiment" or "some embodiments" etc. described in the specification of this application mean that one or more embodiments of the present application include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the statements "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways.
另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。In addition, the technical solutions between the various embodiments can be combined with each other, but they must be based on the fact that they can be implemented by ordinary technicians in this field. When the combination of technical solutions is contradictory or cannot be implemented, it should be deemed that such combination of technical solutions does not exist and is not within the scope of protection required by this application.
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the accompanying drawings are only examples and do not necessarily include all the contents and operations/steps, nor must they be executed in the order described. For example, some operations/steps may also be decomposed, combined or partially merged, so the actual execution order may change according to actual conditions.
本申请实施例提供的医学长文本问答方法可以应用于手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等电子设备上,本申请实施例对电子设备的具体类型不作任何限制。The medical long text question-and-answer method provided in the embodiments of the present application can be applied to electronic devices such as mobile phones, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (VR) devices, laptop computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (PDA), etc. The embodiments of the present application do not impose any restrictions on the specific types of electronic devices.
例如,所述电子设备可以是WLAN中的站点(STAION,ST),可以是蜂窝电话、无绳电话、会话启动协议(Session Initiation Protocol,SIP)电话、无线本地环路(WirelessLocal Loop,WLL)站、个人数字处理(Personal Digital Assistant,PDA)设备、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、车联网终端、电脑、膝上型计算机、手持式通信设备、手持式计算设备、卫星无线设备、无线调制解调器卡、电视机顶盒(set top box,STB)、用户驻地设备(customer premise equipment,CPE)和/或用于在无线系统上进行通信的其它设备以及下一代通信系统,例如,5G网络中的移动终端或者未来演进的公共陆地移动网络(Public Land Mobile Network,PLMN)网络中的移动终端等。For example, the electronic device may be a station (STAION, ST) in a WLAN, a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA) device, a handheld device with wireless communication function, a computing device or other processing device connected to a wireless modem, a vehicle-mounted device, a vehicle networking terminal, a computer, a laptop computer, a handheld communication device, a handheld computing device, a satellite wireless device, a wireless modem card, a TV set top box (STB), a customer premises equipment (CPE) and/or other devices for communicating on a wireless system and a next-generation communication system, such as a mobile terminal in a 5G network or a mobile terminal in a future evolved Public Land Mobile Network (PLMN) network, etc.
请参阅图1,图1为本申请实施例提供的医学长文本问答方法的应用场景图,详述如下:Please refer to FIG1 , which is an application scenario diagram of the medical long text question answering method provided in an embodiment of the present application, and is described in detail as follows:
请参阅图1,图1为本申请实施例提供的医学文本问答方法的应用场景图,详述如下:Please refer to FIG1 , which is an application scenario diagram of the medical text question answering method provided in an embodiment of the present application, and is described in detail as follows:
电子设备获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据。The electronic device acquires medical text data, extracts question and answer data from the medical text data, and concatenates a plurality of the question and answer data to obtain long text question and answer data having a text length of a target length.
其中,医学文本数据可以来源于一个数据库,也可以来源于多个数据库。Among them, medical text data can come from one database or multiple databases.
为便于说明,以医学文本数据来源于三个数据库为例,详述如下:For the sake of explanation, the medical text data comes from three databases as an example, which is described in detail as follows:
三个数据库包括数据库A、数据库B和数据库C。电子设备分别连接数据库A、数据库B和数据库C,从数据库A、数据库B和数据库C获取医学文本数据。The three databases include database A, database B and database C. The electronic device is connected to database A, database B and database C respectively, and obtains medical text data from database A, database B and database C.
在本申请实施例中,电子设备可以同时连接不同的数据库,从不同的数据库中获取医学文本数据。In an embodiment of the present application, the electronic device can be connected to different databases at the same time to obtain medical text data from different databases.
请参阅图2,图2是本申请实施例提供的医学长文本问答方法的流程示意图,该方法可以应用于电子设备,所述电子设备存储有大语言模型。Please refer to Figure 2, which is a flow chart of the medical long text question and answer method provided in an embodiment of the present application. The method can be applied to an electronic device that stores a large language model.
如图2所示,本申请实施例提供的医学长文本问答方法包括以下步骤,详述如下:As shown in FIG2 , the medical long text question answering method provided in the embodiment of the present application includes the following steps, which are described in detail as follows:
S201,获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据;S201, obtaining medical text data, extracting question and answer data from the medical text data, and concatenating a plurality of the question and answer data to obtain long text question and answer data having a text length of a target length;
其中,长文本问答数据为长文本的问答数据。Among them, the long text question and answer data is the question and answer data of long text.
其中,长文本问答数据的种类很多,为便于说明,长文本问答数据的包括以下几种:There are many types of long text question and answer data. For the sake of explanation, the long text question and answer data include the following types:
药物研发与临床试验数据:在药物研发过程中,研究人员会进行大量的临床试验,以评估药物的安全性和有效性。这些试验会产生大量的文本数据,如试验方案、试验记录、结果报告等。Drug development and clinical trial data: During the drug development process, researchers conduct a large number of clinical trials to evaluate the safety and effectiveness of drugs. These trials generate a large amount of text data, such as trial protocols, trial records, and result reports.
医疗教育与培训数据:医学院校、医疗机构等经常组织各种医疗教育与培训活动,如讲座、研讨会、在线课程等。这些活动会产生大量的教学资料和学习数据,其中很多是长文本形式。Medical education and training data: Medical schools, medical institutions, etc. often organize various medical education and training activities, such as lectures, seminars, online courses, etc. These activities will generate a large amount of teaching materials and learning data, many of which are in the form of long texts.
本申请实施例中,医疗领域中的长文本问答数据,对于提升推动医学研究进步具有重要意义。In the embodiments of the present application, long text question and answer data in the medical field is of great significance for promoting the progress of medical research.
S202,在所述长文本问答数据中提取出语义单元,将所述语义单元的第一嵌入数据输入到所述大语言模型的RoPE编码层中,获取所述RoPE编码层的原始基础频率和注意力对数的原始缩放因子,所述第一嵌入数据为携带了所述语义单元的语义信息的嵌入数据;S202, extracting a semantic unit from the long text question-answering data, inputting first embedding data of the semantic unit into the RoPE coding layer of the large language model, and obtaining an original base frequency and an original scaling factor of the attention logarithm of the RoPE coding layer, wherein the first embedding data is embedding data carrying semantic information of the semantic unit;
其中,RoPE(Rotary Position Embedding)编码层,中文为旋转位置编码层,是一种在Transformer架构中使用的位置编码方式。Among them, the RoPE (Rotary Position Embedding) encoding layer, which means rotational position encoding layer in Chinese, is a position encoding method used in the Transformer architecture.
S203,根据所述目标长度、所述大语言模型在预训练阶段处理文本的最大长度以及预设的第一扩大系数生成模型,生成第一扩大系数,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率;S203, generating a first expansion coefficient according to the target length, the maximum length of the text processed by the large language model in the pre-training stage, and a preset first expansion coefficient generation model, and multiplying the first expansion coefficient by the original base frequency to obtain a target base frequency;
其中,所述第一扩大系数生成模型为:Wherein, the first expansion coefficient generation model is:
; ;
其中,所述x为所述第一扩大系数,所述为所述目标长度,所述c为所述大语言模型在预训练阶段处理文本的所述最大长度。Wherein, x is the first expansion coefficient, is the target length, and c is the maximum length of the text processed by the large language model in the pre-training stage.
其中,当目标长度大于最大长度时,第一扩大系数大于1,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率,此时,目标基础频率会大于原始基础频率,因此能实现放大原始基础频率的技术效果。Among them, when the target length is greater than the maximum length, the first expansion coefficient is greater than 1, and the first expansion coefficient is multiplied by the original basic frequency to obtain the target basic frequency. At this time, the target basic frequency will be greater than the original basic frequency, so the technical effect of amplifying the original basic frequency can be achieved.
S204,获取所述语义单元的位置索引,根据所述位置索引、所述最大长度以及预设的第二扩大系数生成模型,生成第二扩大系数,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子;S204, obtaining a position index of the semantic unit, generating a second expansion factor according to the position index, the maximum length and a preset second expansion factor generation model, and multiplying the second expansion factor by the original scaling factor to obtain a target scaling factor;
其中,所述第二扩大系数生成模型为:Wherein, the second expansion coefficient generation model is:
; ;
其中,所述y为所述第二扩大系数,所述c为所述大语言模型在预训练阶段处理文本的所述最大长度,所述m为所述语义单元的所述位置索引,所述z为预设常数。Among them, y is the second expansion coefficient, c is the maximum length of the text processed by the large language model in the pre-training stage, m is the position index of the semantic unit, and z is a preset constant.
当位置索引大于最大长度时,第二扩大系数会大于1,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子,此时,目标缩放因子会大于原始缩放因子,因此能实现放大原始缩放因子的技术效果。When the position index is greater than the maximum length, the second expansion factor will be greater than 1. The second expansion factor is multiplied by the original scaling factor to obtain the target scaling factor. At this time, the target scaling factor will be greater than the original scaling factor, thereby achieving the technical effect of amplifying the original scaling factor.
其中,z的值为用户自设或系统默认,在此不做限制。The value of z is set by the user or by the system default and is not restricted here.
为便于说明,举例如下:For ease of explanation, an example is given below:
比如,z取1,第二扩大系数生成模型为:For example, if z is 1, the second expansion coefficient generation model is:
; ;
比如,z取2,第二扩大系数生成模型为:For example, if z is 2, the second expansion coefficient generation model is:
; ;
本申请实施例中,随着大语言模型的任务的变化,可以重新调整预设常数以获得最佳性能。In the embodiment of the present application, as the tasks of the large language model change, the preset constants can be readjusted to obtain the best performance.
S205,在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述第二嵌入数据为携带了所述语义单元的所述语义信息以及相对位置信息的嵌入数据;S205, in the RoPE coding layer, based on the target base frequency and the target scaling factor, encode the first embedded data of the semantic unit to obtain second embedded data of the semantic unit, where the second embedded data is embedded data carrying the semantic information and relative position information of the semantic unit;
示例性地,在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,包括:Exemplarily, in the RoPE coding layer, encoding the first embedded data of the semantic unit based on the target base frequency and the target scaling factor to obtain the second embedded data of the semantic unit includes:
在所述RoPE编码层中,基于RoPE编码的扩展方式, 对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述RoPE编码的扩展方式包括所述目标基础频率和所述目标缩放因子。In the RoPE coding layer, based on an extension mode of RoPE coding, the first embedded data of the semantic unit is encoded to obtain second embedded data of the semantic unit, wherein the extension mode of RoPE coding includes the target base frequency and the target scaling factor.
为便于说明, 采用RoPE编码的扩展方式如下:For ease of explanation, the extension method using RoPE encoding is as follows:
; ;
其中,m表示Token的位置索引,b为RoPE的原始基础频率,d为嵌入向量的维度,j为所述维度的序号,t为注意力对数的原始缩放因子。Among them, m represents the position index of the Token, b is the original base frequency of RoPE, d is the dimension of the embedding vector, j is the ordinal number of the dimension, and t is the original scaling factor of the attention logarithm.
其中,x为所述第一扩大系数,y为所述第二扩大系数。Wherein, x is the first expansion coefficient, and y is the second expansion coefficient.
其中,就是目标基础频率。in, This is the target fundamental frequency.
其中,就是目标缩放因子。in, is the target scaling factor.
当增大时,也会增大,因为且是常数。when When increasing, will also increase because and is a constant.
当增大时,会减小。when When increasing, Will decrease.
因此,当增大时, 会趋近于 1,会趋近于 。Therefore, when When increasing, will approach 1, will approach .
其中,目标基础频率放大了原始基础频率,目标缩放因子放大了原始缩放因子,因此,RoPE编码的扩展方式,能够增强大语言模型对所述语义单元的相对位置信息的敏感度和表达能力,详述如下:Among them, the target base frequency amplifies the original base frequency, and the target scaling factor amplifies the original scaling factor. Therefore, the extended method of RoPE encoding can enhance the sensitivity and expression ability of the large language model to the relative position information of the semantic unit, as detailed as follows:
增强位置敏感性:当原始基础频率增大时,不同位置索引之间的差异在旋转角度上体现得更加明显。这意味着模型能够更容易地区分不同所述语义单元的相对位置信息。Enhanced position sensitivity: When the original base frequency increases, the difference between different position indices becomes more obvious in terms of rotation angle, which means that the model can more easily distinguish the relative position information of different semantic units.
提高模型表达能力:目标缩放因子增大时,即使是小范围的位置变化也可能导致较大的旋转角度变化,这有助于模型捕捉到更加精细的位置依赖关系。Improve model expressiveness: When the target scaling factor increases, even a small position change may result in a large rotation angle change, which helps the model capture more subtle position dependencies.
灵活性:通过目标基础频率和所述目标缩放因子,灵活地控制RoPE编码的性质,以适应不同的应用场景。例如,在处理长序列时,目标基础频率和所述目标缩放因子有助于提高大语言模型对远距离依赖关系的建模能力。Flexibility: The target base frequency and the target scaling factor can be used to flexibly control the properties of RoPE encoding to adapt to different application scenarios. For example, when processing long sequences, the target base frequency and the target scaling factor can help improve the modeling ability of large language models for long-range dependencies.
本申请实施例中,目标基础频率和所述目标缩放因子能够增强RoPE编码在表示位置信息时的敏感度和表达能力,使大语言模型能够更好地捕捉序列数据中的位置依赖关系,并提高其在各种应用场景下的性能。In an embodiment of the present application, the target fundamental frequency and the target scaling factor can enhance the sensitivity and expressiveness of RoPE encoding in representing position information, enabling the large language model to better capture positional dependencies in sequence data and improve its performance in various application scenarios.
S206,按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,所述分组策略为当前所述分组包含前一个或后一个所述分组中预设数量的所述语义单元的所述第二嵌入数据;S206, dividing the second embedded data of the plurality of semantic units into a plurality of groups according to a preset grouping strategy, wherein the grouping strategy is that the current group contains a preset number of the second embedded data of the semantic units in the previous or next group;
其中,所述按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,包括:The step of dividing the second embedded data of the plurality of semantic units into a plurality of groups according to a preset grouping strategy includes:
将所述长文本问答数据的上下文长度除以所述大语言模型预训练的所述最大长度并向上取整,确定分组数量;Divide the context length of the long text question-answering data by the maximum length of the pre-trained large language model and round up to determine the number of groups;
根据所述分组数量和所述分组策略,将多个所述语义单元的所述第二嵌入数据分成多个所述分组。The second embedded data of a plurality of the semantic units are divided into a plurality of the groups according to the grouping number and the grouping strategy.
其中,所述预设数量包括分组大小的1/2、分组大小的1/3、分组大小的1/5、分组大小的1/6中的其中一种或其组合。The preset number includes one of 1/2 of the group size, 1/3 of the group size, 1/5 of the group size, 1/6 of the group size, or a combination thereof.
为便于说明,举例如下:For the sake of illustration, an example is given below:
比如,以N为每个组内所包含的字符数量,现在把大语言模型处理文本的能力推广到10K,为了节省计算资源,把输入的长文本问答数据先分成几个组进行组内计算,分组的数量计算如下:10000/4096 =2.441,向上取整等于3,因为分组必须为整数,向下取整会丢失信息。For example, let N be the number of characters contained in each group. Now let's extend the text processing capability of the large language model to 10K. In order to save computing resources, the input long text question and answer data is first divided into several groups for intra-group calculations. The number of groups is calculated as follows: 10000/4096 = 2.441, which is rounded up to 3. Because the group must be an integer, rounding down will result in information loss.
其中,为了保证分组之后的信息流动性与衔接性,在相邻分组的首尾部分的保留预设数量的所述语义单元的所述第二嵌入数据,预设数量为N/2,分组如下:In order to ensure the fluidity and connectivity of information after grouping, a preset number of the second embedded data of the semantic units are reserved at the beginning and end of adjacent groups, the preset number is N/2, and the grouping is as follows:
。 .
其中,为分组的集合。in, A collection of groups.
其中,表示第0个Token到第N个Token为一个分组。in, Indicates that the 0th Token to the Nth Token are a group.
其中,为目标长度。其中,为最后一个分组。in, is the target length. For the last group.
其中,为最后一个分组的上一个分组。in, The previous group of the last group.
为便于说明,举例如下:For the sake of illustration, an example is given below:
比如,N为4096时,为了相邻的组之间信息有交互,第一个分组的内容为:第1个语义单元至第4096个语义单元。For example, when N is 4096, in order to ensure information interaction between adjacent groups, the content of the first group is: the 1st semantic unit to the 4096th semantic unit.
第二个分组和第一个分组保持部分的内容重叠,第二个分组为:第2048个语义单元到6144个语义单元。The second grouping partially overlaps with the first grouping, and the second grouping is: the 2048th semantic unit to the 6144th semantic unit.
此时,第2048到4096的语义单元在第一个分组和第二个分组都存在,后面的分组也按照此方法进行。At this time, the semantic units from 2048 to 4096 exist in both the first group and the second group, and the subsequent groupings are also carried out in this way.
S207,将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果。S207, input the grouping into the attention layer of the large language model, perform attention calculation on the current grouping to obtain the calculation result corresponding to the grouping, merge the calculation results corresponding to the grouping to obtain the attention result output by the attention layer.
其中,在所述将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果之后,所述医学长文本问答方法,包括:Wherein, after inputting the group into the attention layer of the large language model, performing attention calculation on the current group to obtain the calculation result corresponding to the group, merging the calculation results corresponding to the group to obtain the attention result output by the attention layer, the medical long text question answering method includes:
选择全参数训练或者Lora微调的方式,对所述大语言模型进行训练,得到训练后的所述大语言模型。Select full parameter training or Lora fine-tuning to train the large language model to obtain the trained large language model.
为便于说明,举例如下:For the sake of illustration, an example is given below:
比如,将注意力结果传递给大语言模型的全连接层,获取全连接层基于注意力结果输出的预测答案,根据所述预测答案和问答数据的预设答案之间的比较结果,计算损失函数的损失值,当所述损失值大于预设值时,通过全参数训练或者Lora微调的方式,更新所述大语言模型的模型参数,基于所述问答数据,训练更新所述模型参数后的所述大语言模型,直至所述损失值小于所述预设值,才停止训练所述大语言模型,得到训练后的所述大语言模型。For example, the attention result is passed to the fully connected layer of the large language model, and the predicted answer output by the fully connected layer based on the attention result is obtained. According to the comparison result between the predicted answer and the preset answer of the question and answer data, the loss value of the loss function is calculated. When the loss value is greater than the preset value, the model parameters of the large language model are updated through full parameter training or Lora fine-tuning. Based on the question and answer data, the large language model after the model parameters are updated is trained until the loss value is less than the preset value, and then the training of the large language model is stopped to obtain the trained large language model.
其中,通过所述注意力结果,可以帮助所述大语言模型理解所述长文本问答数据中不同所述语义单元的相对位置和关系,有利于提高全连接层基于注意力结果输出的预测答案的可靠性和合理性,进而有利于提高所述大语言模型在所述长文本问答数据上的推理效果。Among them, the attention results can help the large language model understand the relative positions and relationships of the different semantic units in the long text question and answer data, which is conducive to improving the reliability and rationality of the predicted answers output by the fully connected layer based on the attention results, and further helps to improve the reasoning effect of the large language model on the long text question and answer data.
在所述选择全参数训练或者Lora微调的方式,对所述大语言模型进行训练,得到训练后的所述大语言模型之后,所述医学长文本问答方法,包括:After the large language model is trained by selecting the full parameter training or Lora fine-tuning method to obtain the trained large language model, the medical long text question answering method includes:
连接医疗问答系统,在所述医疗问答系统中,获取医疗对话;connecting to a medical question-answering system, and obtaining a medical conversation in the medical question-answering system;
在所述医疗对话中,获取当前问题,提取所述当前问题的特征向量,将所述特征向量输入至训练后的所述大语言模型中,获取训练后的所述大语言模型基于所述特征向量输出的当前答案。In the medical conversation, a current question is obtained, a feature vector of the current question is extracted, the feature vector is input into the trained large language model, and a current answer output by the trained large language model based on the feature vector is obtained.
本申请实施例有益效果在于两方面,一方面,将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果,因此,电子设备能获取大语言模型的注意力结果,有利于提高注意力结果的获取效率;另一方面,通过所述注意力结果,可以帮助所述大语言模型理解所述长文本问答数据中不同所述语义单元的相对位置和关系,有利于提高所述大语言模型在所述长文本问答数据上的推理效果。The beneficial effects of the embodiments of the present application lie in two aspects. On the one hand, the grouping is input into the attention layer of the large language model, and attention calculation is performed on the current group to obtain the calculation result corresponding to the group, and the calculation results corresponding to the group are merged to obtain the attention result output by the attention layer. Therefore, the electronic device can obtain the attention result of the large language model, which is beneficial to improving the efficiency of obtaining the attention result. On the other hand, the attention result can help the large language model understand the relative positions and relationships of the different semantic units in the long text question and answer data, which is beneficial to improving the reasoning effect of the large language model on the long text question and answer data.
请参阅图3,图3为本申请实施例提供的获取长文本问答数据的流程图,详述如下:Please refer to FIG. 3 , which is a flow chart of obtaining long text question and answer data provided by an embodiment of the present application, and is described in detail as follows:
S301,获取所述医学文本数据,在所述医学文本数据中提取每个问句,将所述问句及所述问句对应的答案信息组合成问答对;S301, acquiring the medical text data, extracting each question sentence from the medical text data, and combining the question sentence and the answer information corresponding to the question sentence into a question-answer pair;
S302,将多个所述问答对组成问答数据,将多个所述问答数据进行拼接,得到所述文本长度为所述目标长度的所述长文本问答数据。S302, composing a plurality of the question-answer pairs into question-answer data, and concatenating the plurality of the question-answer data to obtain the long text question-answer data whose text length is the target length.
为便于说明,举例如下:For ease of explanation, an example is given below:
比如,医疗领域的问答数据A:问答对的数量为5000;For example, question-answering data A in the medical field: the number of question-answer pairs is 5,000;
医疗领域的问答数据B:问答对的数量为3000;Medical question-answering data B: The number of question-answer pairs is 3,000;
将问答数据A、问答数据B进行拼接,得到文本长度为目标长度的长文本问答数据,长文本问答数据的问答对的数量为 8000。Concatenate question-and-answer data A and question-and-answer data B to obtain long text question-and-answer data with a text length of the target length. The number of question-and-answer pairs in the long text question-and-answer data is 8,000.
医疗领域的长文本问答数据确保每个问答对都保持其原始内容,将每个问答对合并到一个更大的长文本问答数据中。这样,长文本问答数据就会包含来自医疗领域的各种问题和回答,从而形成一个内容丰富、多样化的问答数据集。The long text question-answering data in the medical field ensures that each question-answer pair maintains its original content and merges each question-answer pair into a larger long text question-answering data. In this way, the long text question-answering data will contain a variety of questions and answers from the medical field, forming a rich and diverse question-answering dataset.
在本申请实施例中,长文本问答数据为大语言模型提供了丰富的上下文信息,使得大语言模型能够更准确地理解和推断问题背后的意图和逻辑关系。通过接触大量的问答对,大语言模型可以学习到在不同情境下如何生成符合逻辑的回答,从而提升大语言模型回答的准确性和相关性。In the embodiment of the present application, the long text question and answer data provides rich contextual information for the large language model, enabling the large language model to more accurately understand and infer the intention and logical relationship behind the question. By being exposed to a large number of question and answer pairs, the large language model can learn how to generate logical answers in different situations, thereby improving the accuracy and relevance of the large language model's answers.
对应于上文实施例所述的医学长文本问答方法,请参阅图4,图4为本申请实施例提供的医学长文本问答装置的示意性框图,图4所示的医学长文本问答装置400可以应用于如图1所示的应用场景图中的电子设备,下面以电子设备为例,对图4所示的医学长文本问答装置400进行详细阐述,医学长文本问答装置400可以包括第一获取模块401、第二获取模块402、第一生成模块403、第二生成模块404、编码模块405、分组模块406、输出模块407。Corresponding to the medical long text question and answer method described in the above embodiment, please refer to Figure 4, which is a schematic block diagram of the medical long text question and answer device provided in the embodiment of the present application. The medical long text question and answer device 400 shown in Figure 4 can be applied to the electronic device in the application scenario diagram shown in Figure 1. Taking the electronic device as an example, the medical long text question and answer device 400 shown in Figure 4 is explained in detail. The medical long text question and answer device 400 may include a first acquisition module 401, a second acquisition module 402, a first generation module 403, a second generation module 404, an encoding module 405, a grouping module 406, and an output module 407.
第一获取模块401,用于获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据;The first acquisition module 401 is used to acquire medical text data, extract question and answer data from the medical text data, and concatenate multiple question and answer data to obtain long text question and answer data with a target length;
第二获取模块402,用于在所述长文本问答数据中提取出语义单元,将所述语义单元的第一嵌入数据输入到所述大语言模型的RoPE编码层中,获取所述RoPE编码层的原始基础频率和注意力对数的原始缩放因子,所述第一嵌入数据为携带了所述语义单元的语义信息的嵌入数据;A second acquisition module 402 is used to extract semantic units from the long text question-answering data, input first embedding data of the semantic units into the RoPE coding layer of the large language model, and obtain the original base frequency and the original scaling factor of the attention logarithm of the RoPE coding layer, wherein the first embedding data is the embedding data carrying the semantic information of the semantic unit;
第一生成模块403,用于根据所述目标长度、所述大语言模型在预训练阶段处理文本的最大长度以及预设的第一扩大系数生成模型,生成第一扩大系数,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率;A first generating module 403 is used to generate a model according to the target length, the maximum length of the text processed by the large language model in the pre-training stage, and a preset first expansion coefficient, generate a first expansion coefficient, and multiply the first expansion coefficient by the original basic frequency to obtain a target basic frequency;
第二生成模块404,用于获取所述语义单元的位置索引,根据所述位置索引、所述最大长度以及预设的第二扩大系数生成模型,生成第二扩大系数,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子;A second generation module 404 is used to obtain a position index of the semantic unit, generate a second expansion factor according to the position index, the maximum length and a preset second expansion factor generation model, and multiply the second expansion factor by the original scaling factor to obtain a target scaling factor;
编码模块405,用于在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述第二嵌入数据为携带了所述语义单元的所述语义信息以及相对位置信息的嵌入数据;An encoding module 405 is used to encode the first embedded data of the semantic unit in the RoPE encoding layer based on the target base frequency and the target scaling factor to obtain second embedded data of the semantic unit, where the second embedded data is embedded data carrying the semantic information and relative position information of the semantic unit;
分组模块406,用于按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,所述分组策略为当前所述分组包含前一个或后一个所述分组中预设数量的所述语义单元的所述第二嵌入数据;A grouping module 406, configured to group the second embedded data of the plurality of semantic units into a plurality of groups according to a preset grouping strategy, wherein the grouping strategy is that the current group contains a preset number of the second embedded data of the semantic units in the previous or next group;
输出模块407,用于将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果。The output module 407 is used to input the group into the attention layer of the large language model, perform attention calculation on the current group, obtain the calculation result corresponding to the group, merge the calculation results corresponding to the group, and obtain the attention result output by the attention layer.
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。It should be noted that the various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the various embodiments can be referenced to each other.
本申请实施例有益效果在于两方面,一方面,将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果,因此,电子设备能获取大语言模型的注意力结果,有利于提高注意力结果的获取效率;另一方面,通过所述注意力结果,可以帮助所述大语言模型理解所述长文本问答数据中不同所述语义单元的相对位置和关系,有利于提高所述大语言模型在所述长文本问答数据上的推理效果。The beneficial effects of the embodiments of the present application lie in two aspects. On the one hand, the grouping is input into the attention layer of the large language model, and attention calculation is performed on the current group to obtain the calculation result corresponding to the group, and the calculation results corresponding to the group are merged to obtain the attention result output by the attention layer. Therefore, the electronic device can obtain the attention result of the large language model, which is beneficial to improving the efficiency of obtaining the attention result. On the other hand, the attention result can help the large language model understand the relative positions and relationships of the different semantic units in the long text question and answer data, which is beneficial to improving the reasoning effect of the large language model on the long text question and answer data.
请参阅图5,图5为本申请实施例提供的电子设备的结构示意图。Please refer to FIG. 5 , which is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
如图5所示,图5的电子设备2包括:至少一个处理器20、存储器21以及存储在所述存储器21中并可在所述至少一个处理器20上运行的计算机程序22,所述处理器20执行所述计算机程序22时实现上述任意各个方法实施例中的步骤。As shown in Figure 5, the electronic device 2 of Figure 5 includes: at least one processor 20, a memory 21, and a computer program 22 stored in the memory 21 and executable on the at least one processor 20, and when the processor 20 executes the computer program 22, the steps in any of the above-mentioned method embodiments are implemented.
该电子设备2可包括,但不仅限于处理器20、存储器21。本领域技术人员可以理解,图5仅仅是电子设备2的举例,并不构成对电子设备2的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。The electronic device 2 may include, but is not limited to, a processor 20 and a memory 21. Those skilled in the art will appreciate that FIG5 is merely an example of the electronic device 2 and does not limit the electronic device 2. The electronic device 2 may include more or fewer components than shown in the figure, or may combine certain components, or may include different components, such as input and output devices, network access devices, etc.
其中,所述处理器20用于运行存储在存储器21中的计算机程序22,并在执行所述计算机程序22时实现如下步骤:The processor 20 is used to run the computer program 22 stored in the memory 21, and implements the following steps when executing the computer program 22:
获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据;Acquire medical text data, extract question and answer data from the medical text data, and concatenate multiple question and answer data to obtain long text question and answer data with a target length;
在所述长文本问答数据中提取出语义单元,将所述语义单元的第一嵌入数据输入到所述大语言模型的RoPE编码层中,获取所述RoPE编码层的原始基础频率和注意力对数的原始缩放因子,所述第一嵌入数据为携带了所述语义单元的语义信息的嵌入数据;Extracting semantic units from the long text question-answering data, inputting first embedding data of the semantic units into the RoPE coding layer of the large language model, and obtaining the original base frequency and the original scaling factor of the attention logarithm of the RoPE coding layer, wherein the first embedding data is embedding data carrying semantic information of the semantic unit;
根据所述目标长度、所述大语言模型在预训练阶段处理文本的最大长度以及预设的第一扩大系数生成模型,生成第一扩大系数,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率;Generate a first expansion coefficient according to the target length, the maximum length of the text processed by the large language model in the pre-training stage, and a preset first expansion coefficient generation model, and multiply the first expansion coefficient by the original basic frequency to obtain a target basic frequency;
获取所述语义单元的位置索引,根据所述位置索引、所述最大长度以及预设的第二扩大系数生成模型,生成第二扩大系数,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子;Obtaining a position index of the semantic unit, generating a second expansion factor according to the position index, the maximum length, and a preset second expansion factor generation model, and multiplying the second expansion factor by the original scaling factor to obtain a target scaling factor;
在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述第二嵌入数据为携带了所述语义单元的所述语义信息以及相对位置信息的嵌入数据;按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,所述分组策略为当前所述分组包含前一个或后一个所述分组中预设数量的所述语义单元的所述第二嵌入数据;将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果。In the RoPE coding layer, based on the target basic frequency and the target scaling factor, the first embedded data of the semantic unit is encoded to obtain the second embedded data of the semantic unit, where the second embedded data is embedded data carrying the semantic information and relative position information of the semantic unit; according to a preset grouping strategy, the second embedded data of multiple semantic units are divided into multiple groups, and the grouping strategy is that the current group contains a preset number of the second embedded data of the semantic units in the previous or next group; the group is input into the attention layer of the large language model, and the attention calculation is performed on the current group to obtain the calculation result corresponding to the group, and the calculation results corresponding to the groups are merged to obtain the attention result output by the attention layer.
在一些实施例中,处理器20,用于实现:In some embodiments, the processor 20 is configured to implement:
获取所述医学文本数据,在所述医学文本数据中提取每个问句,将所述问句及所述问句对应的答案信息组合成问答对;Acquire the medical text data, extract each question sentence from the medical text data, and combine the question sentence and the answer information corresponding to the question sentence into a question-answer pair;
将多个所述问答对组成问答数据,将多个所述问答数据进行拼接,得到所述文本长度为所述目标长度的所述长文本问答数据。Multiple question-answer pairs are combined into question-answer data, and multiple question-answer data are concatenated to obtain the long text question-answer data whose text length is the target length.
在一些实施例中,处理器20,用于实现:In some embodiments, the processor 20 is configured to implement:
将所述长文本问答数据的上下文长度除以所述大语言模型预训练的所述最大长度并向上取整,确定分组数量;Divide the context length of the long text question-answering data by the maximum length of the pre-trained large language model and round up to determine the number of groups;
根据所述分组数量和所述分组策略,将多个所述语义单元的所述第二嵌入数据分成多个所述分组。The second embedded data of a plurality of the semantic units are divided into a plurality of the groups according to the grouping number and the grouping strategy.
在一些实施例中,处理器20,用于实现:In some embodiments, the processor 20 is configured to implement:
选择全参数训练或者Lora微调的方式,对所述大语言模型进行训练,得到训练后的所述大语言模型。Select full parameter training or Lora fine-tuning to train the large language model to obtain the trained large language model.
在一些实施例中,处理器20,用于实现:In some embodiments, the processor 20 is configured to implement:
所述第一扩大系数生成模型为:The first expansion coefficient generation model is:
; ;
其中,所述x为所述第一扩大系数,所述为所述目标长度,所述c为所述大语言模型在预训练阶段处理文本的所述最大长度。Wherein, x is the first expansion coefficient, is the target length, and c is the maximum length of the text processed by the large language model in the pre-training stage.
在一些实施例中,处理器20,用于实现:In some embodiments, the processor 20 is configured to implement:
所述第二扩大系数生成模型为:The second expansion coefficient generation model is:
; ;
其中,所述y为所述第二扩大系数,所述c为所述大语言模型在预训练阶段处理文本的所述最大长度,所述m为所述语义单元的所述位置索引,所述z为预设常数。Among them, y is the second expansion coefficient, c is the maximum length of the text processed by the large language model in the pre-training stage, m is the position index of the semantic unit, and z is a preset constant.
在一些实施例中,处理器20,用于实现:In some embodiments, the processor 20 is configured to implement:
所述预设数量包括分组大小的1/2、分组大小的1/3、分组大小的1/5、分组大小的1/6中的其中一种或其组合。The preset number includes one of 1/2 of the group size, 1/3 of the group size, 1/5 of the group size, 1/6 of the group size, or a combination thereof.
所称处理器20可以是中央处理单元(Central Processing Unit,CPU),该处理器20还可以是其他通用处理器、数字信号处理器 (Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 20 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc.
所述存储器21在一些实施例中可以是所述电子设备2的内部存储单元,例如电子设备2的硬盘或内存。所述存储器21在另一些实施例中也可以是所述电子设备2的外部存储设备,例如所述电子设备2上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器21还可以既包括所述电子设备2的内部存储单元也包括外部存储设备。所述存储器21用于存储操作系统、应用程序、引导装载程序(Boot Loader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器21还可以用于暂时地存储已经输出或者将要输出的数据。In some embodiments, the memory 21 may be an internal storage unit of the electronic device 2, such as a hard disk or memory of the electronic device 2. In other embodiments, the memory 21 may also be an external storage device of the electronic device 2, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, etc. equipped on the electronic device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the electronic device 2. The memory 21 is used to store an operating system, an application program, a boot loader, data, and other programs, such as the program code of the computer program. The memory 21 may also be used to temporarily store data that has been output or is to be output.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiment of the present application. Their specific functions and technical effects can be found in the method embodiment part and will not be repeated here.
本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。An embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments can be implemented.
计算机可读存储介质中存储有程序代码,程序代码可被处理器调用执行上述方法实施例中所描述的医学长文本问答方法。The computer-readable storage medium stores program codes, which can be called by a processor to execute the medical long text question-answering method described in the above method embodiment.
计算机可读存储介质具有程序代码的存储空间。The computer-readable storage medium has a storage space for program codes.
程序代码包括上述方法实施例中所描述的医学长文本问答方法中的任何步骤的代码。The program code includes the code of any step in the medical long text question answering method described in the above method embodiment.
例如,程序代码被处理器调用,可以执行如下步骤:For example, the program code is called by the processor and can execute the following steps:
获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据;Acquire medical text data, extract question and answer data from the medical text data, and concatenate multiple question and answer data to obtain long text question and answer data with a target length;
在所述长文本问答数据中提取出语义单元,将所述语义单元的第一嵌入数据输入到所述大语言模型的RoPE编码层中,获取所述RoPE编码层的原始基础频率和注意力对数的原始缩放因子,所述第一嵌入数据为携带了所述语义单元的语义信息的嵌入数据;Extracting semantic units from the long text question-answering data, inputting first embedding data of the semantic units into the RoPE coding layer of the large language model, and obtaining the original base frequency and the original scaling factor of the attention logarithm of the RoPE coding layer, wherein the first embedding data is embedding data carrying semantic information of the semantic unit;
根据所述目标长度、所述大语言模型在预训练阶段处理文本的最大长度以及预设的第一扩大系数生成模型,生成第一扩大系数,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率;Generate a first expansion coefficient according to the target length, the maximum length of the text processed by the large language model in the pre-training stage, and a preset first expansion coefficient generation model, and multiply the first expansion coefficient by the original basic frequency to obtain a target basic frequency;
获取所述语义单元的位置索引,根据所述位置索引、所述最大长度以及预设的第二扩大系数生成模型,生成第二扩大系数,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子;Obtaining a position index of the semantic unit, generating a second expansion factor according to the position index, the maximum length, and a preset second expansion factor generation model, and multiplying the second expansion factor by the original scaling factor to obtain a target scaling factor;
在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述第二嵌入数据为携带了所述语义单元的所述语义信息以及相对位置信息的嵌入数据;In the RoPE coding layer, based on the target base frequency and the target scaling factor, the first embedded data of the semantic unit is encoded to obtain second embedded data of the semantic unit, where the second embedded data is embedded data carrying the semantic information and relative position information of the semantic unit;
按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,所述分组策略为当前所述分组包含前一个或后一个所述分组中预设数量的所述语义单元的所述第二嵌入数据;According to a preset grouping strategy, the second embedded data of the plurality of semantic units are divided into a plurality of groups, wherein the grouping strategy is that the current group contains a preset number of the second embedded data of the semantic units in the previous or next group;
将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果。The group is input into the attention layer of the large language model, attention calculation is performed on the current group to obtain the calculation result corresponding to the group, and the calculation results corresponding to the group are merged to obtain the attention result output by the attention layer.
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。The specific implementation of the above operations can be found in the previous embodiments, which will not be described in detail here.
其中,所述计算机可读存储介质也可以是医学长文本问答装置或电子设备的外部存储设备,例如,医学长文本问答装置或电子设备上配备的插接式硬盘,智能存储卡(SmartMedia Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card),非易失性计算机可读介质(non-transitory computer-readable storage medium)等。Among them, the computer-readable storage medium can also be an external storage device of the medical long-text question-and-answer device or electronic device, for example, a plug-in hard disk equipped on the medical long-text question-and-answer device or electronic device, a smart memory card (SmartMedia Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), a non-volatile computer-readable medium (non-transitory computer-readable storage medium), etc.
由于该计算机可读存储介质中所存储的计算机程序,可以执行本申请实施例所提供的任一种医学长文本问答方法,因此该计算机可读存储介质可以实现本申请实施例所提供的任一种医学长文本问答方法所能实现的有益效果,详见前面的实施例,在此不再赘述。Since the computer program stored in the computer-readable storage medium can execute any one of the medical long-text question-and-answer methods provided in the embodiments of the present application, the computer-readable storage medium can achieve the beneficial effects that can be achieved by any one of the medical long-text question-and-answer methods provided in the embodiments of the present application. Please see the previous embodiments for details and will not be repeated here.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行上述的医学长文本问答方法。An embodiment of the present application provides a computer program product. When the computer program product is run on an electronic device, the electronic device executes the above-mentioned medical long text question and answer method.
计算机程序产品被电子设备加载,可以执行如下步骤:The computer program product is loaded into the electronic device and can execute the following steps:
获取医学文本数据,在所述医学文本数据中提取问答数据,将多个所述问答数据进行拼接处理,得到文本长度为目标长度的长文本问答数据;Acquire medical text data, extract question and answer data from the medical text data, and concatenate multiple question and answer data to obtain long text question and answer data with a target length;
在所述长文本问答数据中提取出语义单元,将所述语义单元的第一嵌入数据输入到所述大语言模型的RoPE编码层中,获取所述RoPE编码层的原始基础频率和注意力对数的原始缩放因子,所述第一嵌入数据为携带了所述语义单元的语义信息的嵌入数据;Extracting semantic units from the long text question-answering data, inputting first embedding data of the semantic units into the RoPE coding layer of the large language model, and obtaining the original base frequency and the original scaling factor of the attention logarithm of the RoPE coding layer, wherein the first embedding data is embedding data carrying semantic information of the semantic unit;
根据所述目标长度、所述大语言模型在预训练阶段处理文本的最大长度以及预设的第一扩大系数生成模型,生成第一扩大系数,将所述第一扩大系数乘以所述原始基础频率,得到目标基础频率;Generate a first expansion coefficient according to the target length, the maximum length of the text processed by the large language model in the pre-training stage, and a preset first expansion coefficient generation model, and multiply the first expansion coefficient by the original basic frequency to obtain a target basic frequency;
获取所述语义单元的位置索引,根据所述位置索引、所述最大长度以及预设的第二扩大系数生成模型,生成第二扩大系数,将所述第二扩大系数乘以所述原始缩放因子,得到目标缩放因子;Obtaining a position index of the semantic unit, generating a second expansion factor according to the position index, the maximum length, and a preset second expansion factor generation model, and multiplying the second expansion factor by the original scaling factor to obtain a target scaling factor;
在所述RoPE编码层中,基于所述目标基础频率和所述目标缩放因子,对所述语义单元的所述第一嵌入数据进行编码,得到所述语义单元的第二嵌入数据,所述第二嵌入数据为携带了所述语义单元的所述语义信息以及相对位置信息的嵌入数据;In the RoPE coding layer, based on the target base frequency and the target scaling factor, the first embedded data of the semantic unit is encoded to obtain second embedded data of the semantic unit, where the second embedded data is embedded data carrying the semantic information and relative position information of the semantic unit;
按预设的分组策略,将多个所述语义单元的所述第二嵌入数据分成多个分组,所述分组策略为当前所述分组包含前一个或后一个所述分组中预设数量的所述语义单元的所述第二嵌入数据;According to a preset grouping strategy, the second embedded data of the plurality of semantic units are divided into a plurality of groups, wherein the grouping strategy is that the current group contains a preset number of the second embedded data of the semantic units in the previous or next group;
将所述分组输入到所述大语言模型的注意力层中,对当前所述分组进行注意力计算,得到所述分组对应的计算结果,将所述分组对应的所述计算结果进行合并,得到所述注意力层输出的注意力结果。The group is input into the attention layer of the large language model, attention calculation is performed on the current group to obtain the calculation result corresponding to the group, and the calculation results corresponding to the group are merged to obtain the attention result output by the attention layer.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。The technicians in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiment can be integrated in a processing unit, or each unit can exist physically separately, or two or more units can be integrated in one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application. The specific working process of the units and modules in the above-mentioned system can refer to the corresponding process in the aforementioned method embodiment, which will not be repeated here.
基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到电子设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。Based on such understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of each of the above-mentioned method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in source code form, object code form, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device that can carry the computer program code to an electronic device, a recording medium, a computer memory, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), an electrical carrier signal, a telecommunication signal, and a software distribution medium. For example, a USB flash drive, a mobile hard disk, a disk or an optical disk.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made using the contents of the present application specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present application.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410978469.6A CN118520882B (en) | 2024-07-22 | 2024-07-22 | Medical long text question answering method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410978469.6A CN118520882B (en) | 2024-07-22 | 2024-07-22 | Medical long text question answering method and device, electronic equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118520882A true CN118520882A (en) | 2024-08-20 |
| CN118520882B CN118520882B (en) | 2024-11-01 |
Family
ID=92285355
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410978469.6A Active CN118520882B (en) | 2024-07-22 | 2024-07-22 | Medical long text question answering method and device, electronic equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118520882B (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119313232A (en) * | 2024-10-18 | 2025-01-14 | 哈尔滨工业大学 | Crop growth simulation platform integration method and crop growth large model construction method |
| CN119940433A (en) * | 2024-12-30 | 2025-05-06 | 华南理工大学 | Large language model acceleration method, device, equipment and medium based on hierarchical grouping attention |
| CN120832219A (en) * | 2025-09-19 | 2025-10-24 | 宏景科技股份有限公司 | Task scheduling method, system, electronic device and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116414962A (en) * | 2023-04-11 | 2023-07-11 | 南京邮电大学 | A question-answer matching method based on attention mechanism |
| WO2023161630A1 (en) * | 2022-02-22 | 2023-08-31 | Unlikely Artificial Intelligence Limited | Computer implemented methods for the automated analysis or use of data, including use of a large language model |
| CN117349275A (en) * | 2023-12-04 | 2024-01-05 | 中电数创(北京)科技有限公司 | A text structuring method and system based on large language model |
| CN117875434A (en) * | 2024-03-13 | 2024-04-12 | 中国科学技术大学 | Financial large model length extrapolation method for expanding input context length |
-
2024
- 2024-07-22 CN CN202410978469.6A patent/CN118520882B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023161630A1 (en) * | 2022-02-22 | 2023-08-31 | Unlikely Artificial Intelligence Limited | Computer implemented methods for the automated analysis or use of data, including use of a large language model |
| CN116414962A (en) * | 2023-04-11 | 2023-07-11 | 南京邮电大学 | A question-answer matching method based on attention mechanism |
| CN117349275A (en) * | 2023-12-04 | 2024-01-05 | 中电数创(北京)科技有限公司 | A text structuring method and system based on large language model |
| CN117875434A (en) * | 2024-03-13 | 2024-04-12 | 中国科学技术大学 | Financial large model length extrapolation method for expanding input context length |
Non-Patent Citations (1)
| Title |
|---|
| RONGSHENG LI等: "Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings", 《MDPI》, 6 April 2024 (2024-04-06) * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119313232A (en) * | 2024-10-18 | 2025-01-14 | 哈尔滨工业大学 | Crop growth simulation platform integration method and crop growth large model construction method |
| CN119940433A (en) * | 2024-12-30 | 2025-05-06 | 华南理工大学 | Large language model acceleration method, device, equipment and medium based on hierarchical grouping attention |
| CN120832219A (en) * | 2025-09-19 | 2025-10-24 | 宏景科技股份有限公司 | Task scheduling method, system, electronic device and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118520882B (en) | 2024-11-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN118520882B (en) | Medical long text question answering method and device, electronic equipment and storage medium | |
| CN111814466B (en) | Information extraction method based on machine reading understanding and related equipment thereof | |
| WO2022142011A1 (en) | Method and device for address recognition, computer device, and storage medium | |
| CN112016312B (en) | Data relation extraction method and device, electronic equipment and storage medium | |
| US20140108305A1 (en) | Ranking for inductive synthesis of string transformations | |
| CN111767375A (en) | Semantic recall method, device, computer equipment and storage medium | |
| CN118364870B (en) | Optimization method, device, electronic device and storage medium for large language model | |
| CN114780701B (en) | Automatic question-answer matching method, device, computer equipment and storage medium | |
| CN114238656A (en) | A Reinforcement Learning-Based Completion Method for Affinity Graph and Its Related Equipment | |
| CN112836521A (en) | Question-answer matching method, device, computer equipment and storage medium | |
| CN112287069A (en) | Information retrieval method and device based on voice semantics and computer equipment | |
| CN114358023A (en) | Intelligent question-answer recall method and device, computer equipment and storage medium | |
| CN112085091A (en) | Artificial intelligence-based short text matching method, device, equipment and storage medium | |
| CN114724579A (en) | Voice separation method and device, computer equipment and storage medium | |
| CN114637831A (en) | Data query method and related equipment based on semantic analysis | |
| CN119005177B (en) | Sequence processing method, electronic device and storage medium | |
| CN118070072A (en) | Problem processing method, device, equipment and storage medium based on artificial intelligence | |
| CN117422136A (en) | Knowledge vector base construction method, device, equipment, system and storage medium | |
| CN118246551A (en) | Vertical domain model reasoning acceleration method and device | |
| CN115858768A (en) | Text abstract generation method and device for long document and storage medium | |
| CN113505595A (en) | Text phrase extraction method and device, computer equipment and storage medium | |
| CN111639164A (en) | Question-answer matching method and device of question-answer system, computer equipment and storage medium | |
| CN115618043A (en) | Text manipulation graph mutual inspection method and model training method, device, equipment, medium | |
| CN113569567B (en) | Text recognition method, text recognition device, computer readable medium and electronic equipment | |
| CN119848835B (en) | Remote missed scan report generation method, device, computer equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A medical long text question answering method, device, electronic equipment, and storage medium Granted publication date: 20241101 Pledgee: Hunan Xiangjiang New Area Rural Commercial Bank Co.,Ltd. Dongfanghong Branch Pledgor: Wisdom Eye Technology Co.,Ltd. Registration number: Y2025980023483 |
|
| PE01 | Entry into force of the registration of the contract for pledge of patent right |