[go: up one dir, main page]

CN120258951A - A financial risk prediction method based on multimodal data fusion - Google Patents

A financial risk prediction method based on multimodal data fusion Download PDF

Info

Publication number
CN120258951A
CN120258951A CN202510343420.8A CN202510343420A CN120258951A CN 120258951 A CN120258951 A CN 120258951A CN 202510343420 A CN202510343420 A CN 202510343420A CN 120258951 A CN120258951 A CN 120258951A
Authority
CN
China
Prior art keywords
vector
text
log
audio
financial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510343420.8A
Other languages
Chinese (zh)
Inventor
罗惠麟
夏喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202510343420.8A priority Critical patent/CN120258951A/en
Publication of CN120258951A publication Critical patent/CN120258951A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提出了一种基于多模态数据融合的金融风险预测方法,属于人工智能风险预测技术领域,包括S1:对目标公司的财报电话会议进行音频特征向量的提取;S2:对目标公司的财报电话会议进行文本纪要特征向量的提取;S3:使用大语言模型对目标公司的财报电话会议的文本纪要进行总结,得到文本纪要的总结对应的嵌入向量;S4:对目标公司的新闻文本特征提取,得到新闻文本特征向量;S5:对目标公司在目标日期前一段时期内的时序交易数据进行特征提取,得到时序数据的特征向量;S6:获得联合表示向量;S7:将联合表示向量输入多任务学习框架中,进行风险指标预测。本发明预测精度显著提升、全面利用隐性信息,有很好的多任务学习预测能力。

The present invention proposes a financial risk prediction method based on multimodal data fusion, which belongs to the field of artificial intelligence risk prediction technology, including S1: extracting audio feature vectors of the target company's earnings call conference; S2: extracting text minutes feature vectors of the target company's earnings call conference; S3: summarizing the text minutes of the target company's earnings call conference using a large language model to obtain an embedding vector corresponding to the summary of the text minutes; S4: extracting features of the target company's news text to obtain a news text feature vector; S5: extracting features of the target company's time series transaction data within a period before the target date to obtain a feature vector of the time series data; S6: obtaining a joint representation vector; S7: inputting the joint representation vector into a multi-task learning framework to predict risk indicators. The present invention significantly improves prediction accuracy, fully utilizes implicit information, and has good multi-task learning prediction capabilities.

Description

Financial risk prediction method based on multi-mode data fusion
Technical Field
The invention relates to the technical field of artificial intelligence risk prediction, in particular to a financial risk prediction method based on multi-mode data fusion.
Background
With the rapid development of artificial intelligence technology, the level of intelligence in the financial industry is significantly improved. In the field of financial risk prediction, the artificial intelligence technology not only can improve risk identification efficiency, but also provides important support for enterprises to optimize investment decisions and improve risk management. However, current research and practice remains somewhat deficient in how efficiently multimodal data is integrated to achieve more accurate financial risk prediction. Existing financial risk prediction techniques have focused mainly on the utilization of a single data source, such as historical stock price analysis based on time series or news text emotion analysis based on natural language processing. This approach, while effective in certain scenarios, has difficulty adequately reflecting the multidimensional driving factor of market fluctuations. In particular, the limitations of single data source prediction methods are becoming more apparent in the face of complex market environments and increasing data diversity. In addition, a large amount of key information exists in the financial market in the form of unstructured data, such as text summaries of financial teleconferences, voice recordings, media news stories, and the like. These data, which contain implicit features such as the talker's intonation, speech speed, and mood changes, are important clues for assessing market risk. However, the conventional model has technical bottlenecks in processing unstructured data, and lacks effective fusion capability for multi-mode data, so that the prediction result is difficult to comprehensively and accurately reflect market dynamics. The prior art has the obvious defects that the single data source causes the limitation of a prediction result, and the adaptability and the stability of a prediction model are poor, so that the accuracy and the reliability of the prediction result are greatly influenced.
In recent years, large language models (Large Language Model, LLM) have been increasingly introduced into the financial domain by virtue of their advantages in cross-domain text processing, emotion analysis, and multitasking learning. Research shows that the large language model can efficiently process long text and generate high-quality abstract, and has the capability of deep analysis of financial data. However, relying on only a single model makes it difficult to fully capture the multidimensional dynamics of the financial market. How to integrate multi-modal data into a unified prediction framework and combine the advantages of a large language model to perform deep analysis becomes a core problem to be solved in the current financial risk prediction field. In summary, the method for predicting financial risk based on multi-modal data fusion is provided, multi-modal data is comprehensively utilized and deep feature extraction and fusion analysis are performed through a large language model, and the precision and stability of financial risk prediction are remarkably improved, and the financial risk management provides powerful technical support and is very necessary through integration of multi-source data and design of a multi-task prediction framework.
Disclosure of Invention
In view of the above, the invention provides a financial risk prediction method based on multi-mode data fusion, which integrates various data sources, comprehensively captures dynamic and market information of a target company, predicts various risk indexes simultaneously, and has good prediction comprehensiveness and accuracy.
The technical scheme of the invention is realized in such a way that the invention provides a financial risk prediction method based on multi-mode data fusion, which comprises the following steps:
s1, extracting an audio feature vector from a financial report teleconference of a target company;
S2, extracting text summary feature vectors from the financial and newspaper teleconference of the target company;
s3, summarizing text summary of the financial and newspaper teleconference of the target company by using a large language model to obtain an embedded vector corresponding to the summary of the text summary;
s4, extracting news text characteristics of the target company to obtain news text characteristic vectors;
s5, extracting features of time sequence transaction data of a target company in a period of time before a target date to obtain feature vectors of the time sequence data;
S6, fusing the obtained audio feature vector, text summary feature vector, embedded vector corresponding to summary of text summary, news text feature vector and feature vector of time sequence data to obtain joint expression vector, and
And S7, inputting the joint representation vector into a multi-task learning framework to predict the risk index.
Based on the above technical solution, preferably, the specific content of step S1 is:
S11, extracting an embedded vector of audio of a financial teleconference by using WENETSPEECH pre-training language model, and carrying out audio preprocessing on a section of audio Wherein the method comprises the steps ofAn i-th frame representing audio, n representing the number of frames in the audio, each frame of audio being converted into a vector representationThereby obtaining an audio embedded vector of the whole audio A c
S12, sending the audio embedded vector E ac into a multi-head self-attention module MHSA, and further extracting the characteristic vector T ac=MHSA(Eac of the audio;
And S13, sending the feature vector T ac to an average pooling layer Average Pooling Layer to obtain a compressed audio feature vector T a,Ta=AveragePooling(Tac).
Preferably, the vector representation of each frame of audioThe dimension of the compressed audio feature vector T a is 512, the dimension of the text summary feature vector is 768, the dimension of the embedded vector corresponding to the summary of the text summary is 768, and the dimension of the joint representation vector is 512.
Preferably, the multi-headed self-attention module MHSA processes the input audio embedded vector in parallel using a plurality of attention heads, calculates the attention weight of the audio embedded vector using the following formula:The method comprises the steps of obtaining a query vector Q=E acWQ, a key vector K=E acWK, a value vector V=E acWV,WQ、WK and W V, wherein d k is a linear projection matrix, d k is the dimension of attention heads, a softmax function is used for normalization, splicing the output of each attention head, and obtaining a characteristic vector T ac,Tac=Concat(head1,head2,headh)Wo of audio through linear transformation, wherein subscripts 1,2 and h are distinguishing marks of different attention heads, W o is a combined linear transformation matrix, and Concat is a combining function.
Further preferably, the specific content of step S2 is:
s21, preprocessing the text summary of the financial and newspaper teleconference of the target company to obtain a sentence collection Wherein the method comprises the steps ofRepresenting the L-th sentence in the text summary, L=1, 2, L, L representing the number of sentences in the text summary, using Sentence-BERT pre-training language model to model each sentenceEmbedded vector mapped as financial accounting teleconference text summaryVector representation for obtaining text summary of whole financial accounting teleconference
S22, sending the vector representation of the text summary of the whole financial newspaper teleconference to a multi-head self-attention module MHSA, and further extracting a feature vector T tc,Ttc=MHSA(Etc of the text summary;
S23, sending the text summary feature vector T tc into an average pooling layer Average Pooling Layer to obtain a compressed text summary feature vector T t=AveragePooling(Ttc).
Still more preferably, the specific content of step S3 is:
S31, segmenting paragraphs, namely segmenting a text summary of a financial and newspaper teleconference of a target company according to logic paragraphs to obtain paragraphs p M=p1,p2,,pm, M=1, 2, M, and the subscript M represents the number of the paragraphs;
s32, merging all paragraph summaries { S 1,s2,sM } into a whole text, inputting a large language model to generate a comprehensive summary S, S=LLM ({ S 1,s2,sM });
And S33, vectorizing the comprehensive summary S by using a Sentence-BERT pre-training language model to generate an embedded vector T l,Tl = SBERT (S) corresponding to the summary of the text summary.
Still more preferably, the specific content of step S4 is:
S41, collecting news text data of a target company several days before a target transaction day, wherein the news text data are represented as N= { N 1,n2,nk }, N K0 represents a K0 th news text, and K is the total number of news;
S42, analyzing each news n K0 by using a large language model LLM, extracting metadata M K0,mK0=LLM(nK0), and integrating the metadata of all news into an integral metadata set M N={m1,m2,,mk;
s43, finding k historical news groups N K0=N1,N2,,Nk which are most similar to the metadata of news text data N of a plurality of days before the target transaction in the historical news data set, calculating the semantic relevance of N and any one of the historical news groups N K0, F is a function of converting news texts into embedded vectors, and selecting a group of historical news H with highest semantic similarity with news text data N of a plurality of days before a target transaction day;
And S44, splicing the relevant texts of market trend after N, H and H occur, and then converting the spliced texts into embedded vectors T n by using a SBERT model, wherein the embedded vectors T n are used as news text feature vectors of a target company several days before the target transaction day.
Still more preferably, the specific content of step S5 is:
S51, collecting time sequence transaction data D of a target company 30 days before a target date, wherein the time sequence transaction data D comprises a daily closing price and a daily deal volume, and the time sequence transaction data are expressed as D= { (p 1,v1),(p2,v2),,(pd,vd) }, wherein p F=p1,p2,,pd represents the closing price on the F th day, and v F=v1,v2,,vd represents the deal volume on the F th day;
S52, inputting the time sequence transaction data D into a Bi-directional long-short term memory network Bi-LSTM, capturing time sequence dynamic characteristics of the transaction data by the Bi-directional long-short term memory network Bi-LSTM, and outputting a characteristic vector T v,Tv = BiLSTM (D) containing the time sequence data;
S53, capturing dynamic relations among different time sequence features through a vector autoregressive VAR model:
log(σ3,t)=α31,1log(σ-3,t)+β1,2log(σ-7,t)+β1.3log(σ-15,t)+β1.4.log(σ-30,t)+u3,t;
log(σ7,t)=α72,1log(σ-3,t)+β2,2log(σ-7,t)+β2.3log(σ-15,t)+β2.4.log(σ-30,t)+u7,t;
log(σ15,t)=α153,1log(σ-3,t)+β3,2log(σ-7,t)+β3.3log(σ-15,t)+β3.4.log(σ-30,t)+u15,t;
log(σ30,t)=α304,1log(σ-3,t)+β4,2log(σ-7,t)+β4.3log(σ-15,t)+β4.4.log(σ-30,t)+u30,t;
σ z,t represents the volatility of the stock price of the target company in z= 3,7,15 and 30 days, z+.f, z ε t, u z,t is a white noise term, β a,b is a coefficient matrix of dynamic relationship, a, b=1, 2,3,4, α z is an intercept term, and the volatility of the stock price is defined as the standard deviation of the profitability of the target company in z days:
Still more preferably, the joint representation vector E of step S6 is fused :E=w0+w1Ta+w2Tt+w3Tl+w4Tn+w5Tv+ε, by the following formula, where w 0 is the bias term, w 1,w2,w3,w4, w is the 5 fusion weight, and epsilon is the error term, representing random noise.
Still further preferably, the specific content of step S7 is that, using the joint representation vector E generated in step S6 as an input to a multi-task learning framework, the multi-task learning framework predicts risk indexes including volatility σ 3,t、σ7,t、σ15,t and σ 30,t of stock prices of the target company over 3 days, 7 days, 15 days and 30 days, and a single day risk value VAR;
constructing an independent first prediction sub-network MLP aiming at stock price volatility of each time span, wherein the prediction result of the stock price volatility is that Wherein the method comprises the steps ofF MLP,z (·) is the first predicted subnetwork MLP of the time span z;
For a single-day risk value VAR, an independent second prediction sub-network MLP is constructed, the input is a joint expression vector E, and the output is a prediction value of VAR Wherein the method comprises the steps ofF MLP,VAR (·) is the second prediction subnetwork MLP;
the joint loss function simultaneously optimizes stock price volatility and VAR prediction tasks, and the formula of the joint loss function is as follows: Mu is a weight superparameter for balancing stock price volatility prediction error and VAR prediction error, y j and A true value and a predicted value representing the stock price volatility index; Mean square error representing stock price volatility prediction, q represents quantile threshold of single day risk value, V and Respectively representing a real single-day risk value and a predicted single-day risk value; And (5) representing a quantile regression loss function of the single-day risk value prediction task.
Compared with the prior art, the financial risk prediction method based on multi-mode data fusion has the following beneficial effects:
(1) According to the invention, a brand-new financial risk prediction framework is constructed by fusing multi-mode data, including financial teleconference audio, text, news text and time sequence data, and the multi-mode feature fusion method remarkably improves the perception capability of financial market risks, can capture multi-level information of company operation and market environment, and solves the problem of insufficient prediction accuracy caused by a single data source in the prior art;
(2) The prior art often ignores the non-explicit features contained in the audio of the financial and newspaper teleconference, such as the intonation, the speech speed and the emotion of a speaker, and the invention can reveal the potential information relevance which is difficult to capture by the traditional text analysis method by introducing a multi-head self-attention mechanism MHSA and a large language model LLM to perform joint extraction and deep analysis on the features of the audio and the text data, thereby improving the accuracy and the insight of prediction;
(3) The invention adopts a multi-task learning framework, can simultaneously predict various risk indexes, including stock price volatility and daily risk value VAR of different time spans, and the technical scheme not only improves the prediction efficiency of the model, but also effectively relieves the dependence of the traditional method on a single task and enhances the adaptability of the model to complex task scenes. In addition, by introducing the joint loss function, the multi-task learning process is optimized, and the prediction performance is further improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a financial risk prediction method based on multi-modal data fusion.
Detailed Description
The following description of the embodiments of the present invention will clearly and fully describe the technical aspects of the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Most of the existing financial risk predictions adopt a single data source, cannot capture complex association between enterprise internal operation dynamics and external market environments, are not comprehensive and accurate, and cannot predict multiple risk indexes simultaneously. In view of this, as shown in fig. 1, the present invention provides a financial risk prediction method based on multi-modal data fusion, which includes the following steps:
s1, extracting audio feature vectors from the financial accounting teleconference of the target company.
The specific content of the step S1 is as follows:
S10, audio preprocessing, namely segmenting the audio of the financial and newspaper teleconference, dividing special provisions the audio of the teleconference by taking a frame as a unit, wherein the length of each frame is fixed, for example, 25 milliseconds, and the adjacent frames are overlapped by 10 milliseconds to ensure the capture of the time sequence characteristics of continuous voice, normalizing the audio of the financial and newspaper teleconference, normalizing the waveform amplitude to eliminate the influence of recording equipment or volume difference, and converting the audio format of the financial and newspaper teleconference, such as WAV or FLAC, so as to extract the subsequent characteristics.
S11, extracting an embedded vector of audio of a financial teleconference by using WENETSPEECH pre-training language model, and carrying out audio preprocessing on a section of audioWherein the method comprises the steps ofAn i-th frame representing audio, n representing the number of frames in the audio, each frame of audio being converted into a vector representationThereby obtaining an audio embedded vector of the whole audio A c The pre-training language model WENETSPEECH samples a transducer architecture, and the model is trained on a large-scale voice dataset, so that hidden features of voice can be effectively extracted, and the model is suitable for the field of natural language processing.
S12, the audio embedded vector E ac is sent to a Multi-Head Self-Attention module MHSA, the characteristic vector T ac=MHSA(Eac of the audio is further extracted, and the Multi-Head Self-Attention module, namely Multi-Head Self-Attention, belongs to a technical means which is conventional in the art, usually adopts 8 or 16 Attention heads to process the input audio embedded vector in parallel, and each Attention Head pays Attention to the global dependency relationship among audio frames in different modes.
The multi-head self-attention module MHSA processes the input audio embedded vector in parallel by using a plurality of attention heads, and calculates the attention weight of the audio embedded vector by using the following formula:The method comprises the steps of obtaining a query vector Q=E acWQ, a key vector K=E acWK, a value vector V=E acWV,WQ、WK and W V, wherein d k is a linear projection matrix, d k is the dimension of attention heads, a softmax function is used for normalization, splicing the output of each attention head, and obtaining a characteristic vector T ac,Tac=Concat(head1,head2,headh)Wo of audio through linear transformation, wherein subscripts 1,2 and h are distinguishing marks of different attention heads, W o is a combined linear transformation matrix, and Concat is a combining function.
And S13, sending the feature vector T ac to an average pooling layer Average Pooling Layer to obtain a compressed audio feature vector T a,Ta=AveragePooling(Tac).
In order to preserve a sufficient amount of characteristic information, as a preferred embodiment, the present invention provides that the vector representation of each frame of audioThe dimension of the compressed audio feature vector T a is 512, the average value is taken for each dimension of the audio feature vector T ac, the compressed audio feature vector T ac is 1 vector of 512 dimensions, namely, the operation result of a pooling layer is averaged, the feature dimension is effectively reduced, global voice information is reserved, and the obtained compressed audio feature vector T a can represent implicit information such as intonation, emotion, speech speed and the like contained in audio data and provides high-quality input for subsequent multi-modal feature fusion.
In addition, the dimension of the text summary feature vector is 768, the dimension of the embedded vector corresponding to the summary of the text summary is 768, and the dimension of the joint representation vector is 512.
The invention fully utilizes the unstructured data characteristics of the financial teleconference audio to extract the audio characteristics with high dimension and information density, thereby providing important support for the financial risk prediction model.
S2, extracting text summary feature vectors for the financial and newspaper teleconference of the target company.
The specific content of the step S2 is as follows:
s21, preprocessing the text summary of the financial and newspaper teleconference of the target company to obtain a sentence collection Where t c L represents the L-th sentence in the text summary, L=1, 2, L, L represents the number of sentences in the text summary, using a Sentence-BERT pre-trained language model, abbreviated as SBERT, each sentenceEmbedded vector mapped into 768-dimensional financial teleconference text summaryVector representation for obtaining text summary of whole financial accounting teleconferenceSBERT is a sentence-level semantic representation model based on the BERT architecture, and the sentence pairs are compared and learned through the Siamese network, so that the capturing capability of semantic similarity among sentences is optimized.
The text preprocessing content is to standardize the text summary of the financial and newspaper teleconference of the target company, clear redundant characters and unify formats, remove nonsensical stop words such as a one-way word, a hiccup and the like, and divide sentences according to punctuation marks such as periods, semicolons and the like in the text summary of the financial and newspaper teleconference of the target company to obtain sentence sets.
The multi-head self-attention module MHSA is similar to the content in the step S12, but has the function of capturing the global dependency relationship among sentences, using a plurality of attention heads, each focusing on different semantic relationships, and generating the text summary feature vector T tc of the final context enhancement feature through linear transformation after splicing the outputs of the attention heads.
S23, sending the text summary feature vector T tc into an average pooling layer Average Pooling Layer to obtain a compressed text summary feature vector T t=AveragePooling(Ttc).
The method and the device have the advantages that average values are taken for each dimension of feature vectors T tc of the text summary, the average values are compressed into 1 768-dimension text summary feature vectors, comprehensive semantic information of the text summary of the whole financial and newspaper teleconference is represented, semantic and context relations in the text are fully reserved, the logical association among key financial information, semantic emotion and sentences is included, high-quality text input is provided for subsequent multi-modal feature fusion, and the method and the device have remarkable advantages in the aspect of capturing the text implicit information of the text summary of the financial and newspaper teleconference by combining the context features, so that an important data basis is provided for financial risk prediction.
And S3, summarizing the text summary of the financial and newspaper teleconference of the target company by using a large language model to obtain an embedded vector corresponding to the summary of the text summary, extracting core content by cutting and summarizing the text summary of the financial and newspaper teleconference, and generating the embedded vector corresponding to the summary of the text summary for subsequent use.
The specific content of the step S3 is as follows:
S31, segmenting paragraphs, namely segmenting a text summary of a financial and newspaper teleconference of a target company according to logical paragraphs to obtain paragraphs p M=p1,p2,,pm, M=1, 2, M, wherein the subscript M represents the number of the paragraphs, inputting each paragraph p M into a large language model LLM, and extracting a paragraph level abstract S M=LLM(pM),sM as an abstract of the paragraph p M.
S32, merging all paragraph summaries { S 1,s2,sM } into an overall text, inputting a large language model to generate a comprehensive summary S, S=LLM ({ S 1,s2,sM }), and after the comprehensive summary is generated, further adjusting the granularity of the summary according to requirements, for example, performing length control on the longer text summary or removing redundant information.
And S33, vectorizing the comprehensive summary S by using a Sentence-BERT pre-training language model to generate an embedded vector T l,Tl = SBERT (S) corresponding to the summary of the text summary. Summary of text summary the corresponding embedded vector T l is 768-dimensional feature vector representing semantic information of the text summary. The vector fuses the core content and the global information of the financial and newspaper meeting text, and provides important support for multi-mode feature fusion.
Through the steps, after paragraph segmentation, paragraph level summarization and comprehensive summarization are carried out on the summary of the financial and newspaper teleconference text, the key content of the text is effectively summarized by the generated embedded vector T l, and the utilization efficiency and prediction precision of the financial risk prediction model on the text data can be remarkably improved.
And S4, extracting news text characteristics of the target company to obtain news text characteristic vectors. The method comprises the steps of carrying out semantic analysis and similarity calculation on news texts related to target companies, and generating news text feature vectors of the daily front of target transactions.
The specific content of the step S4 is as follows:
And S41, collecting news text data of a target company several days before the target transaction day, wherein the news text data is represented as N= { N 1,n2,nk }, N K0 represents a K0 th news text, and K is the total number of news.
S42, analyzing each news n K0 by using a large language model LLM, extracting metadata m K0,mK0=LLM(nK0), wherein the extracted metadata comprises emotion tendencies (positive, negative or neutral), financial indexes (such as income, profit and the like) related to news and other key information related to a target company. The metadata of all news is integrated into one whole metadata set M N={m1,m2,,mk.
S43, finding k historical news groups N K0=N1,N2,,Nk which are most similar to the metadata of news text data N of a plurality of days before the target transaction in the historical news data set, calculating the semantic relevance of N and any one of the historical news groups N K0,Where f is a function of converting the news text into an embedded vector, and selecting a set of historical news H with the highest semantic similarity to the news text data N several days before the target transaction day.
And S44, splicing the relevant texts of market trend after N, H and H occur, and then converting the spliced texts into embedded vectors T n by using a SBERT model, wherein the embedded vectors T n are used as news text feature vectors of a target company several days before the target transaction day.
The method corresponds to texts from three sources, namely news text data N1 several days before a target transaction day, historical news H2, and market trend related texts after the occurrence of the historical news H3, namely news after the occurrence of the historical news H, such as that the historical news 1 month and 1 day raise H to the liability crisis of company A, the news 1 month and 2 days raise the stock price of company A to drop greatly, and the news 1 month and 2 days are market trend related texts after the occurrence of the historical news H.
Through the step S4, the generated news text feature vector T n of the target company several days before the target transaction day effectively fuses semantic information, emotion analysis results and similar relations with the historical news of the target news, and provides high-quality input features for subsequent multi-modal feature fusion.
And S5, carrying out feature extraction on time sequence transaction data of the target company in a period before the target date to obtain feature vectors of the time sequence data. As a preferred embodiment, this step extracts key features by processing time series transaction data of the target company 30 days before the target date, and captures dynamic relationships between different time series features.
The specific content of the step S5 is as follows:
S51, collecting time sequence transaction data D of a target company 30 days before a target date, wherein the time sequence transaction data D comprises a daily closing price and a daily deal volume, and the time sequence transaction data are expressed as D= { (p 1,v1),(p2,v2),,(pd,vd) }, wherein p F=p1,p2,,pd represents the closing price on the F th day, and v F=v1,v2,,vd represents the deal volume on the F th day;
s52, inputting the time sequence transaction data D into a Bi-long-short-term memory network Bi-LSTM, capturing time sequence dynamic characteristics of the transaction data by the Bi-long-term memory network Bi-LSTM, outputting a 128-dimensional feature vector T v,Tv =BiLSTM (D) containing the time sequence data, modeling the time sequence data by the Bi-long-term memory network Bi-LSTM, and capturing the time sequence dynamic characteristics of the transaction data.
S53, capturing dynamic relations among different time sequence features through a vector autoregressive VAR model:
log(σ3,t)=α31,1log(σ-3,t)+β1,2log(σ-7,t)+β1.3log(σ-15,t)+β1.4.log(σ-30,t)+u3,t;
log(σ7,t)=α72,1log(σ-3,t)+β2,2log(σ-7,t)+β2.3log(σ-15,t)+β2.4.log(σ-30,t)+u7,t;
log(σ15,t)=α153,1log(σ-3,t)+β3,2log(σ-7,t)+β3.3log(σ-15,t)+β3.4.log(σ-30,t)+u15,t;
log(σ30,t)=α304,1log(σ-3,t)+β4,2log(σ-7,t)+β4.3log(σ-15,t)+β4.4.log(σ-30,t)+u30,t;
σ z,t represents the volatility of the stock price of the target company in z= 3,7,15 and 30 days, z+.f, z ε t, u z,t is a white noise term, β a,b is a coefficient matrix of dynamic relationship, a, b=1, 2,3,4, α z is an intercept term, and the volatility of the stock price is defined as the standard deviation of the profitability of the target company in z days: the number of days herein, z, is 3,7,15 and 30, by way of example only, and not by way of limitation of the protocol itself.
Through this step S5, the time series transaction data is converted into the high-dimensional feature vector T v and the parameters of the dynamic relationship, providing key time series information and dynamic relationship support for financial risk prediction.
And S6, fusing the obtained audio feature vector, the text summary feature vector, the embedded vector corresponding to the summary of the text summary, the news text feature vector and the feature vector of the time sequence data to obtain a joint representation vector.
The joint representation vector E described in step S6 is fused :E=w0+w1Ta+w2Tt+w3Tl+w4Tn+w5Tv+ε, by the following formula, where w 0 is the bias term, w 1,w2,w3,w4,w5 is the fusion weight, epsilon is the error term, and represents random noise.
The dimension of the joint representation vector E is fixed to 512 dimensions, representing the integrated features of the multimodal data. The vector retains important information of each modal feature and can be used for subsequent multi-task prediction.
And S7, inputting the joint representation vector into a multi-task learning framework to predict the risk index.
The method comprises the following steps of using a joint representation vector E generated in the step S6 as input of a multi-task learning framework, wherein the multi-task learning framework simultaneously predicts the following risk indexes, namely the volatility sigma 3,t、σ7,t、σ15,t and sigma 30,t of stock prices of target companies in 3 days, 7 days, 15 days and 30 days and a single-day risk value VAR;
constructing an independent first prediction sub-network MLP aiming at stock price volatility of each time span, wherein the prediction result of the stock price volatility is that Wherein the method comprises the steps ofF MLP,z (·) is the first predicted subnetwork MLP of the time span z;
For a single-day risk value VAR, an independent second prediction sub-network MLP is constructed, the input is a joint expression vector E, and the output is a prediction value of VAR Wherein the method comprises the steps ofF MLP,VAR (·) is the second prediction subnetwork MLP;
the joint loss function simultaneously optimizes stock price volatility and VAR prediction tasks, and the formula of the joint loss function is as follows: Mu is a weight superparameter for balancing stock price volatility prediction error and VAR prediction error, y j and A true value and a predicted value representing the stock price volatility index; Mean square error representing stock price volatility prediction, q represents quantile threshold of single day risk value, V and Respectively representing a real single-day risk value and a predicted single-day risk value; And (5) representing a quantile regression loss function of the single-day risk value prediction task.
Through the step S7, the multi-mode features can be effectively integrated by the multi-task learning framework, the fluctuation rate and the single-day risk value of different time spans can be accurately predicted, and reliable support is provided for financial risk management.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. A financial risk prediction method based on multi-mode data fusion is characterized by comprising the following steps:
s1, extracting an audio feature vector from a financial report teleconference of a target company;
S2, extracting text summary feature vectors from the financial and newspaper teleconference of the target company;
s3, summarizing text summary of the financial and newspaper teleconference of the target company by using a large language model to obtain an embedded vector corresponding to the summary of the text summary;
s4, extracting news text characteristics of the target company to obtain news text characteristic vectors;
s5, extracting features of time sequence transaction data of a target company in a period of time before a target date to obtain feature vectors of the time sequence data;
S6, fusing the obtained audio feature vector, text summary feature vector, embedded vector corresponding to summary of text summary, news text feature vector and feature vector of time sequence data to obtain joint expression vector, and
And S7, inputting the joint representation vector into a multi-task learning framework to predict the risk index.
2. The financial risk prediction method based on multi-modal data fusion according to claim 1, wherein the specific content of step S1 is:
S11, extracting an embedded vector of audio of a financial teleconference by using WENETSPEECH pre-training language model, and carrying out audio preprocessing on a section of audio Wherein the method comprises the steps ofAn i-th frame representing audio, n representing the number of frames in the audio, each frame of audio being converted into a vector representationThereby obtaining an audio embedded vector of the whole audio A c
S12, sending the audio embedded vector E ac into a multi-head self-attention module MHSA, and further extracting the characteristic vector T ac=MHSA(Eac of the audio;
And S13, sending the feature vector T ac to an average pooling layer Average Pooling Layer to obtain a compressed audio feature vector T a,Ta=AveragePooling(Tac).
3. A method of financial risk prediction based on multimodal data fusion as claimed in claim 2, wherein the vector representation of each frame of audioThe dimension of the compressed audio feature vector T a is 512, the dimension of the text summary feature vector is 768, the dimension of the embedded vector corresponding to the summary of the text summary is 768, and the dimension of the joint representation vector is 512.
4. The financial risk prediction method based on multi-modal data fusion of claim 2, wherein the multi-headed self-attention module MHSA processes the input audio embedded vector in parallel using a plurality of attention heads, calculates the attention weight of the audio embedded vector using the following formula: The query vector Q=E acWQ, the key vector K=E acWK, the value vectors V=E acWV,WQ、WK and W V are linear projection matrices, d k is the dimension of the attention head, the softmax function is used for normalization, the output of each attention head is spliced, the characteristic vector T ac,Tac=Concat(head1,head2,...headh)Wo of the audio is obtained through linear transformation, the subscripts 1,2 are the distinguishing marks of different attention head heads, W o is the combined linear transformation matrix, and Concat is a combining function.
5. The financial risk prediction method based on multi-modal data fusion according to claim 4, wherein the specific content of step S2 is:
s21, preprocessing the text summary of the financial and newspaper teleconference of the target company to obtain a sentence collection Wherein the method comprises the steps ofRepresenting the L-th sentence in the text summary, l=1, 2, L, L represents the number of sentences in the text summary, mapping each sentence t c L to an embedded vector of financial teleconference text summary using a Sentence-BERT pre-trained language model Vector representation for obtaining text summary of whole financial accounting teleconference
S22, sending the vector representation of the text summary of the whole financial newspaper teleconference to a multi-head self-attention module MHSA, and further extracting a feature vector T tc,Ttc=MHSA(Etc of the text summary;
S23, sending the text summary feature vector T tc into an average pooling layer Average Pooling Layer to obtain a compressed text summary feature vector T t=AveragePooling(Ttc).
6. The financial risk prediction method based on multi-modal data fusion according to claim 5, wherein the specific content of step S3 is:
S31, segmenting paragraphs, namely segmenting a text summary of a financial and newspaper teleconference of a target company according to a logic paragraph to obtain paragraphs p M=p1,p2,...,pm, M=1, 2, M, wherein the subscript M represents the number of the paragraphs;
S32, merging all paragraph summaries { S 1,s2,...sM } into a whole text, inputting a large language model to generate a comprehensive summary S, S=LLM ({ S 1,s2,...sM });
And S33, vectorizing the comprehensive summary S by using a Sentence-BERT pre-training language model to generate an embedded vector T l,Tl = SBERT (S) corresponding to the summary of the text summary.
7. The financial risk prediction method based on multi-modal data fusion according to claim 6, wherein the specific content of step S4 is:
S41, collecting news text data of a target company several days before a target transaction day, wherein the news text data are represented as N= { N 1,n2,...nk }, N K0 represents a K0 th news text, and K is the total number of news;
S42, analyzing each news n K0 by using a large language model LLM, extracting metadata M K0,mK0=LLM(nK0), and integrating the metadata of all news into an integral metadata set M N={m1,m2,...,mk;
s43, finding k historical news groups N K0=N1,N2,...,Nk which are most similar to the metadata of news text data N of a plurality of days before the target transaction in the historical news data set, calculating the semantic relevance of N and any one of the historical news groups N K0, F is a function of converting news texts into embedded vectors, and selecting a group of historical news H with highest semantic similarity with news text data N of a plurality of days before a target transaction day;
And S44, splicing the relevant texts of market trend after N, H and H occur, and then converting the spliced texts into embedded vectors T n by using a SBERT model, wherein the embedded vectors T n are used as news text feature vectors of a target company several days before the target transaction day.
8. The financial risk prediction method based on multi-modal data fusion according to claim 7, wherein the specific content of step S5 is:
S51, collecting time sequence transaction data D of a target company 30 days before a target date, wherein the time sequence transaction data D comprises a daily closing price and a daily deal volume, and the time sequence transaction data are expressed as D= { (p 1,v1),(p2,v2),...,(pd,vd) }, wherein p F=p1,p2,...,pd represents the closing price on the F th day, and v F=v1,v2,...,vd represents the deal volume on the F th day;
S52, inputting the time sequence transaction data D into a Bi-directional long-short term memory network Bi-LSTM, capturing time sequence dynamic characteristics of the transaction data by the Bi-directional long-short term memory network Bi-LSTM, and outputting a characteristic vector T v,Tv = BiLSTM (D) containing the time sequence data;
S53, capturing dynamic relations among different time sequence features through a vector autoregressive VAR model:
log(σ3,t)=α31,1log(σ-3,t)+β1,2log(σ-7,t)+β1.3log(σ-15,t)+β1.4.log(σ-30,t)+u3,t;
log(σ7,t)=α72,1log(σ-3,t)+β2,2log(σ-7,t)+β2.3log(σ-15,t)+β2.4.log(σ-30,t)+u7,t;
log(σ15,t)=α153,1log(σ-3,t)+β3,2log(σ-7,t)+β3.3log(σ-15,t)+β3.4.log(σ-30,t)+u15,t;
log(σ30,t)=α304,1log(σ-3,t)+β4,2log(σ-7,t)+β4.3log(σ-15,t)+β4.4.log(σ-30,t)+u30,t;
σ z,t represents the volatility of the stock price of the target company in z= 3,7,15 and 30 days, z+.f, z ε t, u z,t is a white noise term, β a,b is a coefficient matrix of dynamic relationship, a, b=1, 2,3,4, α z is an intercept term, and the volatility of the stock price is defined as the standard deviation of the profitability of the target company in z days:
9. the method of claim 8, wherein the joint representation vector E in step S6 is fused :E=w0+w1Ta+w2Tt+w3Tl+w4Tn+w5Tv+ε, by the following formula, wherein w 0 is a bias term, w 1,w2,w3,w4,w5 is a fusion weight, and epsilon is an error term, and represents random noise.
10. The financial risk prediction method based on multi-modal data fusion according to claim 9, wherein the specific content of step S7 is that the joint representation vector E generated in step S6 is used as input to a multi-task learning framework, and the multi-task learning framework predicts the risk indexes of the volatility σ 3,t、σ7,t、σ15,t and σ 30,t of the stock prices of the target company in 3 days, 7 days, 15 days and 30 days and the single-day risk value VAR at the same time;
constructing an independent first prediction sub-network MLP aiming at stock price volatility of each time span, wherein the prediction result of the stock price volatility is that Wherein the method comprises the steps ofF MLP,z (·) is the first predicted subnetwork MLP of the time span z;
For a single-day risk value VAR, an independent second prediction sub-network MLP is constructed, the input is a joint expression vector E, and the output is a prediction value of VAR Wherein the method comprises the steps ofIs a predictive value of VAR;
f MLP,VAR (·) is the second prediction subnetwork MLP;
the joint loss function simultaneously optimizes stock price volatility and VAR prediction tasks, and the formula of the joint loss function is as follows: Mu is a weight superparameter for balancing stock price volatility prediction error and VAR prediction error, y j and A true value and a predicted value representing the stock price volatility index; Mean square error representing stock price volatility prediction, q represents quantile threshold of single day risk value, V and Respectively representing a real single-day risk value and a predicted single-day risk value; And (5) representing a quantile regression loss function of the single-day risk value prediction task.
CN202510343420.8A 2025-03-21 2025-03-21 A financial risk prediction method based on multimodal data fusion Pending CN120258951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510343420.8A CN120258951A (en) 2025-03-21 2025-03-21 A financial risk prediction method based on multimodal data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510343420.8A CN120258951A (en) 2025-03-21 2025-03-21 A financial risk prediction method based on multimodal data fusion

Publications (1)

Publication Number Publication Date
CN120258951A true CN120258951A (en) 2025-07-04

Family

ID=96182636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510343420.8A Pending CN120258951A (en) 2025-03-21 2025-03-21 A financial risk prediction method based on multimodal data fusion

Country Status (1)

Country Link
CN (1) CN120258951A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119103A (en) * 1997-05-27 2000-09-12 Visa International Service Association Financial risk prediction systems and methods therefor
WO2023065545A1 (en) * 2021-10-19 2023-04-27 平安科技(深圳)有限公司 Risk prediction method and apparatus, and device and storage medium
CN116402630A (en) * 2023-06-09 2023-07-07 深圳市迪博企业风险管理技术有限公司 A financial risk prediction method and system based on representation learning
CN117011080A (en) * 2023-08-07 2023-11-07 中国工商银行股份有限公司 Financial risk prediction methods, devices, equipment, media and program products
DE202024102877U1 (en) * 2024-06-01 2024-07-17 Ashwini Arte Artificial intelligence-based financial risk assessment system
CN118780615A (en) * 2024-08-05 2024-10-15 威海蓝海银行股份有限公司 Enterprise financial risk prediction method based on multi-source data fusion
CN118898046A (en) * 2024-07-10 2024-11-05 广东工业大学 A multimodal sentiment analysis method combining pre-trained model and self-attention block

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119103A (en) * 1997-05-27 2000-09-12 Visa International Service Association Financial risk prediction systems and methods therefor
WO2023065545A1 (en) * 2021-10-19 2023-04-27 平安科技(深圳)有限公司 Risk prediction method and apparatus, and device and storage medium
CN116402630A (en) * 2023-06-09 2023-07-07 深圳市迪博企业风险管理技术有限公司 A financial risk prediction method and system based on representation learning
CN117011080A (en) * 2023-08-07 2023-11-07 中国工商银行股份有限公司 Financial risk prediction methods, devices, equipment, media and program products
DE202024102877U1 (en) * 2024-06-01 2024-07-17 Ashwini Arte Artificial intelligence-based financial risk assessment system
CN118898046A (en) * 2024-07-10 2024-11-05 广东工业大学 A multimodal sentiment analysis method combining pre-trained model and self-attention block
CN118780615A (en) * 2024-08-05 2024-10-15 威海蓝海银行股份有限公司 Enterprise financial risk prediction method based on multi-source data fusion

Similar Documents

Publication Publication Date Title
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114330366A (en) Event extraction method and related device, electronic equipment and storage medium
CN118364046A (en) An intelligent voice interaction system and method for electricity consumption inspection based on large language model
CN120146056A (en) A complaint intention identification method based on a large model of sentiment analysis
CN119211420B (en) Transformer-based multi-scenario fraud phone recognition system
CN116431806A (en) Natural language understanding method and refrigerator
CN120340497B (en) Intelligent conference automatic conference record and abstract generation method
CN118607511A (en) Financial news sentiment analysis method and device based on distillation to improve BERT
CN119441389A (en) A financial sentiment analysis system
CN120672104A (en) Dispute reconciliation key information extraction system based on user feedback
CN118395301A (en) Public opinion analysis prediction method and system based on propagation big data analysis
CN112488593A (en) Auxiliary bid evaluation system and method for bidding
CN116308735A (en) Financial data prediction method, device, electronic equipment and storage medium
CN118586402B (en) An automatic demand splitting and conversion method based on semantic understanding and industry learning
CN120087371A (en) A work order processing method, device and equipment
CN112133308A (en) Method and device for multi-label classification of voice recognition text
CN119003780A (en) Government affair data extraction and analysis method and system based on big data
CN120258951A (en) A financial risk prediction method based on multimodal data fusion
CN119003774A (en) Emotion analysis method of legend meaning network model based on syntactic lexical enhancement
CN118919023A (en) BERT and LightGBM integrated mental health intelligent monitoring method and system
CN113822019B (en) Text normalization method, related device and readable storage medium
CN119476289B (en) Method, device, equipment and medium for joint extraction of entities and relationships
Ivanov et al. Using NLP Tools for Processing Unstructured Information for Financial Reports
CN119829715A (en) Multi-language intelligent voice question-answering method based on large model intention optimization
CN121390085A (en) CRM sales call intelligent analysis method based on natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination