Disclosure of Invention
In view of the above, the invention provides a financial risk prediction method based on multi-mode data fusion, which integrates various data sources, comprehensively captures dynamic and market information of a target company, predicts various risk indexes simultaneously, and has good prediction comprehensiveness and accuracy.
The technical scheme of the invention is realized in such a way that the invention provides a financial risk prediction method based on multi-mode data fusion, which comprises the following steps:
s1, extracting an audio feature vector from a financial report teleconference of a target company;
S2, extracting text summary feature vectors from the financial and newspaper teleconference of the target company;
s3, summarizing text summary of the financial and newspaper teleconference of the target company by using a large language model to obtain an embedded vector corresponding to the summary of the text summary;
s4, extracting news text characteristics of the target company to obtain news text characteristic vectors;
s5, extracting features of time sequence transaction data of a target company in a period of time before a target date to obtain feature vectors of the time sequence data;
S6, fusing the obtained audio feature vector, text summary feature vector, embedded vector corresponding to summary of text summary, news text feature vector and feature vector of time sequence data to obtain joint expression vector, and
And S7, inputting the joint representation vector into a multi-task learning framework to predict the risk index.
Based on the above technical solution, preferably, the specific content of step S1 is:
S11, extracting an embedded vector of audio of a financial teleconference by using WENETSPEECH pre-training language model, and carrying out audio preprocessing on a section of audio Wherein the method comprises the steps ofAn i-th frame representing audio, n representing the number of frames in the audio, each frame of audio being converted into a vector representationThereby obtaining an audio embedded vector of the whole audio A c
S12, sending the audio embedded vector E ac into a multi-head self-attention module MHSA, and further extracting the characteristic vector T ac=MHSA(Eac of the audio;
And S13, sending the feature vector T ac to an average pooling layer Average Pooling Layer to obtain a compressed audio feature vector T a,Ta=AveragePooling(Tac).
Preferably, the vector representation of each frame of audioThe dimension of the compressed audio feature vector T a is 512, the dimension of the text summary feature vector is 768, the dimension of the embedded vector corresponding to the summary of the text summary is 768, and the dimension of the joint representation vector is 512.
Preferably, the multi-headed self-attention module MHSA processes the input audio embedded vector in parallel using a plurality of attention heads, calculates the attention weight of the audio embedded vector using the following formula:The method comprises the steps of obtaining a query vector Q=E acWQ, a key vector K=E acWK, a value vector V=E acWV,WQ、WK and W V, wherein d k is a linear projection matrix, d k is the dimension of attention heads, a softmax function is used for normalization, splicing the output of each attention head, and obtaining a characteristic vector T ac,Tac=Concat(head1,head2,headh)Wo of audio through linear transformation, wherein subscripts 1,2 and h are distinguishing marks of different attention heads, W o is a combined linear transformation matrix, and Concat is a combining function.
Further preferably, the specific content of step S2 is:
s21, preprocessing the text summary of the financial and newspaper teleconference of the target company to obtain a sentence collection Wherein the method comprises the steps ofRepresenting the L-th sentence in the text summary, L=1, 2, L, L representing the number of sentences in the text summary, using Sentence-BERT pre-training language model to model each sentenceEmbedded vector mapped as financial accounting teleconference text summaryVector representation for obtaining text summary of whole financial accounting teleconference
S22, sending the vector representation of the text summary of the whole financial newspaper teleconference to a multi-head self-attention module MHSA, and further extracting a feature vector T tc,Ttc=MHSA(Etc of the text summary;
S23, sending the text summary feature vector T tc into an average pooling layer Average Pooling Layer to obtain a compressed text summary feature vector T t=AveragePooling(Ttc).
Still more preferably, the specific content of step S3 is:
S31, segmenting paragraphs, namely segmenting a text summary of a financial and newspaper teleconference of a target company according to logic paragraphs to obtain paragraphs p M=p1,p2,,pm, M=1, 2, M, and the subscript M represents the number of the paragraphs;
s32, merging all paragraph summaries { S 1,s2,sM } into a whole text, inputting a large language model to generate a comprehensive summary S, S=LLM ({ S 1,s2,sM });
And S33, vectorizing the comprehensive summary S by using a Sentence-BERT pre-training language model to generate an embedded vector T l,Tl = SBERT (S) corresponding to the summary of the text summary.
Still more preferably, the specific content of step S4 is:
S41, collecting news text data of a target company several days before a target transaction day, wherein the news text data are represented as N= { N 1,n2,nk }, N K0 represents a K0 th news text, and K is the total number of news;
S42, analyzing each news n K0 by using a large language model LLM, extracting metadata M K0,mK0=LLM(nK0), and integrating the metadata of all news into an integral metadata set M N={m1,m2,,mk;
s43, finding k historical news groups N K0=N1,N2,,Nk which are most similar to the metadata of news text data N of a plurality of days before the target transaction in the historical news data set, calculating the semantic relevance of N and any one of the historical news groups N K0, F is a function of converting news texts into embedded vectors, and selecting a group of historical news H with highest semantic similarity with news text data N of a plurality of days before a target transaction day;
And S44, splicing the relevant texts of market trend after N, H and H occur, and then converting the spliced texts into embedded vectors T n by using a SBERT model, wherein the embedded vectors T n are used as news text feature vectors of a target company several days before the target transaction day.
Still more preferably, the specific content of step S5 is:
S51, collecting time sequence transaction data D of a target company 30 days before a target date, wherein the time sequence transaction data D comprises a daily closing price and a daily deal volume, and the time sequence transaction data are expressed as D= { (p 1,v1),(p2,v2),,(pd,vd) }, wherein p F=p1,p2,,pd represents the closing price on the F th day, and v F=v1,v2,,vd represents the deal volume on the F th day;
S52, inputting the time sequence transaction data D into a Bi-directional long-short term memory network Bi-LSTM, capturing time sequence dynamic characteristics of the transaction data by the Bi-directional long-short term memory network Bi-LSTM, and outputting a characteristic vector T v,Tv = BiLSTM (D) containing the time sequence data;
S53, capturing dynamic relations among different time sequence features through a vector autoregressive VAR model:
log(σ3,t)=α3+β1,1log(σ-3,t)+β1,2log(σ-7,t)+β1.3log(σ-15,t)+β1.4.log(σ-30,t)+u3,t;
log(σ7,t)=α7+β2,1log(σ-3,t)+β2,2log(σ-7,t)+β2.3log(σ-15,t)+β2.4.log(σ-30,t)+u7,t;
log(σ15,t)=α15+β3,1log(σ-3,t)+β3,2log(σ-7,t)+β3.3log(σ-15,t)+β3.4.log(σ-30,t)+u15,t;
log(σ30,t)=α30+β4,1log(σ-3,t)+β4,2log(σ-7,t)+β4.3log(σ-15,t)+β4.4.log(σ-30,t)+u30,t;
σ z,t represents the volatility of the stock price of the target company in z= 3,7,15 and 30 days, z+.f, z ε t, u z,t is a white noise term, β a,b is a coefficient matrix of dynamic relationship, a, b=1, 2,3,4, α z is an intercept term, and the volatility of the stock price is defined as the standard deviation of the profitability of the target company in z days:
Still more preferably, the joint representation vector E of step S6 is fused :E=w0+w1Ta+w2Tt+w3Tl+w4Tn+w5Tv+ε, by the following formula, where w 0 is the bias term, w 1,w2,w3,w4, w is the 5 fusion weight, and epsilon is the error term, representing random noise.
Still further preferably, the specific content of step S7 is that, using the joint representation vector E generated in step S6 as an input to a multi-task learning framework, the multi-task learning framework predicts risk indexes including volatility σ 3,t、σ7,t、σ15,t and σ 30,t of stock prices of the target company over 3 days, 7 days, 15 days and 30 days, and a single day risk value VAR;
constructing an independent first prediction sub-network MLP aiming at stock price volatility of each time span, wherein the prediction result of the stock price volatility is that Wherein the method comprises the steps ofF MLP,z (·) is the first predicted subnetwork MLP of the time span z;
For a single-day risk value VAR, an independent second prediction sub-network MLP is constructed, the input is a joint expression vector E, and the output is a prediction value of VAR Wherein the method comprises the steps ofF MLP,VAR (·) is the second prediction subnetwork MLP;
the joint loss function simultaneously optimizes stock price volatility and VAR prediction tasks, and the formula of the joint loss function is as follows: Mu is a weight superparameter for balancing stock price volatility prediction error and VAR prediction error, y j and A true value and a predicted value representing the stock price volatility index; Mean square error representing stock price volatility prediction, q represents quantile threshold of single day risk value, V and Respectively representing a real single-day risk value and a predicted single-day risk value; And (5) representing a quantile regression loss function of the single-day risk value prediction task.
Compared with the prior art, the financial risk prediction method based on multi-mode data fusion has the following beneficial effects:
(1) According to the invention, a brand-new financial risk prediction framework is constructed by fusing multi-mode data, including financial teleconference audio, text, news text and time sequence data, and the multi-mode feature fusion method remarkably improves the perception capability of financial market risks, can capture multi-level information of company operation and market environment, and solves the problem of insufficient prediction accuracy caused by a single data source in the prior art;
(2) The prior art often ignores the non-explicit features contained in the audio of the financial and newspaper teleconference, such as the intonation, the speech speed and the emotion of a speaker, and the invention can reveal the potential information relevance which is difficult to capture by the traditional text analysis method by introducing a multi-head self-attention mechanism MHSA and a large language model LLM to perform joint extraction and deep analysis on the features of the audio and the text data, thereby improving the accuracy and the insight of prediction;
(3) The invention adopts a multi-task learning framework, can simultaneously predict various risk indexes, including stock price volatility and daily risk value VAR of different time spans, and the technical scheme not only improves the prediction efficiency of the model, but also effectively relieves the dependence of the traditional method on a single task and enhances the adaptability of the model to complex task scenes. In addition, by introducing the joint loss function, the multi-task learning process is optimized, and the prediction performance is further improved.
Detailed Description
The following description of the embodiments of the present invention will clearly and fully describe the technical aspects of the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Most of the existing financial risk predictions adopt a single data source, cannot capture complex association between enterprise internal operation dynamics and external market environments, are not comprehensive and accurate, and cannot predict multiple risk indexes simultaneously. In view of this, as shown in fig. 1, the present invention provides a financial risk prediction method based on multi-modal data fusion, which includes the following steps:
s1, extracting audio feature vectors from the financial accounting teleconference of the target company.
The specific content of the step S1 is as follows:
S10, audio preprocessing, namely segmenting the audio of the financial and newspaper teleconference, dividing special provisions the audio of the teleconference by taking a frame as a unit, wherein the length of each frame is fixed, for example, 25 milliseconds, and the adjacent frames are overlapped by 10 milliseconds to ensure the capture of the time sequence characteristics of continuous voice, normalizing the audio of the financial and newspaper teleconference, normalizing the waveform amplitude to eliminate the influence of recording equipment or volume difference, and converting the audio format of the financial and newspaper teleconference, such as WAV or FLAC, so as to extract the subsequent characteristics.
S11, extracting an embedded vector of audio of a financial teleconference by using WENETSPEECH pre-training language model, and carrying out audio preprocessing on a section of audioWherein the method comprises the steps ofAn i-th frame representing audio, n representing the number of frames in the audio, each frame of audio being converted into a vector representationThereby obtaining an audio embedded vector of the whole audio A c The pre-training language model WENETSPEECH samples a transducer architecture, and the model is trained on a large-scale voice dataset, so that hidden features of voice can be effectively extracted, and the model is suitable for the field of natural language processing.
S12, the audio embedded vector E ac is sent to a Multi-Head Self-Attention module MHSA, the characteristic vector T ac=MHSA(Eac of the audio is further extracted, and the Multi-Head Self-Attention module, namely Multi-Head Self-Attention, belongs to a technical means which is conventional in the art, usually adopts 8 or 16 Attention heads to process the input audio embedded vector in parallel, and each Attention Head pays Attention to the global dependency relationship among audio frames in different modes.
The multi-head self-attention module MHSA processes the input audio embedded vector in parallel by using a plurality of attention heads, and calculates the attention weight of the audio embedded vector by using the following formula:The method comprises the steps of obtaining a query vector Q=E acWQ, a key vector K=E acWK, a value vector V=E acWV,WQ、WK and W V, wherein d k is a linear projection matrix, d k is the dimension of attention heads, a softmax function is used for normalization, splicing the output of each attention head, and obtaining a characteristic vector T ac,Tac=Concat(head1,head2,headh)Wo of audio through linear transformation, wherein subscripts 1,2 and h are distinguishing marks of different attention heads, W o is a combined linear transformation matrix, and Concat is a combining function.
And S13, sending the feature vector T ac to an average pooling layer Average Pooling Layer to obtain a compressed audio feature vector T a,Ta=AveragePooling(Tac).
In order to preserve a sufficient amount of characteristic information, as a preferred embodiment, the present invention provides that the vector representation of each frame of audioThe dimension of the compressed audio feature vector T a is 512, the average value is taken for each dimension of the audio feature vector T ac, the compressed audio feature vector T ac is 1 vector of 512 dimensions, namely, the operation result of a pooling layer is averaged, the feature dimension is effectively reduced, global voice information is reserved, and the obtained compressed audio feature vector T a can represent implicit information such as intonation, emotion, speech speed and the like contained in audio data and provides high-quality input for subsequent multi-modal feature fusion.
In addition, the dimension of the text summary feature vector is 768, the dimension of the embedded vector corresponding to the summary of the text summary is 768, and the dimension of the joint representation vector is 512.
The invention fully utilizes the unstructured data characteristics of the financial teleconference audio to extract the audio characteristics with high dimension and information density, thereby providing important support for the financial risk prediction model.
S2, extracting text summary feature vectors for the financial and newspaper teleconference of the target company.
The specific content of the step S2 is as follows:
s21, preprocessing the text summary of the financial and newspaper teleconference of the target company to obtain a sentence collection Where t c L represents the L-th sentence in the text summary, L=1, 2, L, L represents the number of sentences in the text summary, using a Sentence-BERT pre-trained language model, abbreviated as SBERT, each sentenceEmbedded vector mapped into 768-dimensional financial teleconference text summaryVector representation for obtaining text summary of whole financial accounting teleconferenceSBERT is a sentence-level semantic representation model based on the BERT architecture, and the sentence pairs are compared and learned through the Siamese network, so that the capturing capability of semantic similarity among sentences is optimized.
The text preprocessing content is to standardize the text summary of the financial and newspaper teleconference of the target company, clear redundant characters and unify formats, remove nonsensical stop words such as a one-way word, a hiccup and the like, and divide sentences according to punctuation marks such as periods, semicolons and the like in the text summary of the financial and newspaper teleconference of the target company to obtain sentence sets.
The multi-head self-attention module MHSA is similar to the content in the step S12, but has the function of capturing the global dependency relationship among sentences, using a plurality of attention heads, each focusing on different semantic relationships, and generating the text summary feature vector T tc of the final context enhancement feature through linear transformation after splicing the outputs of the attention heads.
S23, sending the text summary feature vector T tc into an average pooling layer Average Pooling Layer to obtain a compressed text summary feature vector T t=AveragePooling(Ttc).
The method and the device have the advantages that average values are taken for each dimension of feature vectors T tc of the text summary, the average values are compressed into 1 768-dimension text summary feature vectors, comprehensive semantic information of the text summary of the whole financial and newspaper teleconference is represented, semantic and context relations in the text are fully reserved, the logical association among key financial information, semantic emotion and sentences is included, high-quality text input is provided for subsequent multi-modal feature fusion, and the method and the device have remarkable advantages in the aspect of capturing the text implicit information of the text summary of the financial and newspaper teleconference by combining the context features, so that an important data basis is provided for financial risk prediction.
And S3, summarizing the text summary of the financial and newspaper teleconference of the target company by using a large language model to obtain an embedded vector corresponding to the summary of the text summary, extracting core content by cutting and summarizing the text summary of the financial and newspaper teleconference, and generating the embedded vector corresponding to the summary of the text summary for subsequent use.
The specific content of the step S3 is as follows:
S31, segmenting paragraphs, namely segmenting a text summary of a financial and newspaper teleconference of a target company according to logical paragraphs to obtain paragraphs p M=p1,p2,,pm, M=1, 2, M, wherein the subscript M represents the number of the paragraphs, inputting each paragraph p M into a large language model LLM, and extracting a paragraph level abstract S M=LLM(pM),sM as an abstract of the paragraph p M.
S32, merging all paragraph summaries { S 1,s2,sM } into an overall text, inputting a large language model to generate a comprehensive summary S, S=LLM ({ S 1,s2,sM }), and after the comprehensive summary is generated, further adjusting the granularity of the summary according to requirements, for example, performing length control on the longer text summary or removing redundant information.
And S33, vectorizing the comprehensive summary S by using a Sentence-BERT pre-training language model to generate an embedded vector T l,Tl = SBERT (S) corresponding to the summary of the text summary. Summary of text summary the corresponding embedded vector T l is 768-dimensional feature vector representing semantic information of the text summary. The vector fuses the core content and the global information of the financial and newspaper meeting text, and provides important support for multi-mode feature fusion.
Through the steps, after paragraph segmentation, paragraph level summarization and comprehensive summarization are carried out on the summary of the financial and newspaper teleconference text, the key content of the text is effectively summarized by the generated embedded vector T l, and the utilization efficiency and prediction precision of the financial risk prediction model on the text data can be remarkably improved.
And S4, extracting news text characteristics of the target company to obtain news text characteristic vectors. The method comprises the steps of carrying out semantic analysis and similarity calculation on news texts related to target companies, and generating news text feature vectors of the daily front of target transactions.
The specific content of the step S4 is as follows:
And S41, collecting news text data of a target company several days before the target transaction day, wherein the news text data is represented as N= { N 1,n2,nk }, N K0 represents a K0 th news text, and K is the total number of news.
S42, analyzing each news n K0 by using a large language model LLM, extracting metadata m K0,mK0=LLM(nK0), wherein the extracted metadata comprises emotion tendencies (positive, negative or neutral), financial indexes (such as income, profit and the like) related to news and other key information related to a target company. The metadata of all news is integrated into one whole metadata set M N={m1,m2,,mk.
S43, finding k historical news groups N K0=N1,N2,,Nk which are most similar to the metadata of news text data N of a plurality of days before the target transaction in the historical news data set, calculating the semantic relevance of N and any one of the historical news groups N K0,Where f is a function of converting the news text into an embedded vector, and selecting a set of historical news H with the highest semantic similarity to the news text data N several days before the target transaction day.
And S44, splicing the relevant texts of market trend after N, H and H occur, and then converting the spliced texts into embedded vectors T n by using a SBERT model, wherein the embedded vectors T n are used as news text feature vectors of a target company several days before the target transaction day.
The method corresponds to texts from three sources, namely news text data N1 several days before a target transaction day, historical news H2, and market trend related texts after the occurrence of the historical news H3, namely news after the occurrence of the historical news H, such as that the historical news 1 month and 1 day raise H to the liability crisis of company A, the news 1 month and 2 days raise the stock price of company A to drop greatly, and the news 1 month and 2 days are market trend related texts after the occurrence of the historical news H.
Through the step S4, the generated news text feature vector T n of the target company several days before the target transaction day effectively fuses semantic information, emotion analysis results and similar relations with the historical news of the target news, and provides high-quality input features for subsequent multi-modal feature fusion.
And S5, carrying out feature extraction on time sequence transaction data of the target company in a period before the target date to obtain feature vectors of the time sequence data. As a preferred embodiment, this step extracts key features by processing time series transaction data of the target company 30 days before the target date, and captures dynamic relationships between different time series features.
The specific content of the step S5 is as follows:
S51, collecting time sequence transaction data D of a target company 30 days before a target date, wherein the time sequence transaction data D comprises a daily closing price and a daily deal volume, and the time sequence transaction data are expressed as D= { (p 1,v1),(p2,v2),,(pd,vd) }, wherein p F=p1,p2,,pd represents the closing price on the F th day, and v F=v1,v2,,vd represents the deal volume on the F th day;
s52, inputting the time sequence transaction data D into a Bi-long-short-term memory network Bi-LSTM, capturing time sequence dynamic characteristics of the transaction data by the Bi-long-term memory network Bi-LSTM, outputting a 128-dimensional feature vector T v,Tv =BiLSTM (D) containing the time sequence data, modeling the time sequence data by the Bi-long-term memory network Bi-LSTM, and capturing the time sequence dynamic characteristics of the transaction data.
S53, capturing dynamic relations among different time sequence features through a vector autoregressive VAR model:
log(σ3,t)=α3+β1,1log(σ-3,t)+β1,2log(σ-7,t)+β1.3log(σ-15,t)+β1.4.log(σ-30,t)+u3,t;
log(σ7,t)=α7+β2,1log(σ-3,t)+β2,2log(σ-7,t)+β2.3log(σ-15,t)+β2.4.log(σ-30,t)+u7,t;
log(σ15,t)=α15+β3,1log(σ-3,t)+β3,2log(σ-7,t)+β3.3log(σ-15,t)+β3.4.log(σ-30,t)+u15,t;
log(σ30,t)=α30+β4,1log(σ-3,t)+β4,2log(σ-7,t)+β4.3log(σ-15,t)+β4.4.log(σ-30,t)+u30,t;
σ z,t represents the volatility of the stock price of the target company in z= 3,7,15 and 30 days, z+.f, z ε t, u z,t is a white noise term, β a,b is a coefficient matrix of dynamic relationship, a, b=1, 2,3,4, α z is an intercept term, and the volatility of the stock price is defined as the standard deviation of the profitability of the target company in z days: the number of days herein, z, is 3,7,15 and 30, by way of example only, and not by way of limitation of the protocol itself.
Through this step S5, the time series transaction data is converted into the high-dimensional feature vector T v and the parameters of the dynamic relationship, providing key time series information and dynamic relationship support for financial risk prediction.
And S6, fusing the obtained audio feature vector, the text summary feature vector, the embedded vector corresponding to the summary of the text summary, the news text feature vector and the feature vector of the time sequence data to obtain a joint representation vector.
The joint representation vector E described in step S6 is fused :E=w0+w1Ta+w2Tt+w3Tl+w4Tn+w5Tv+ε, by the following formula, where w 0 is the bias term, w 1,w2,w3,w4,w5 is the fusion weight, epsilon is the error term, and represents random noise.
The dimension of the joint representation vector E is fixed to 512 dimensions, representing the integrated features of the multimodal data. The vector retains important information of each modal feature and can be used for subsequent multi-task prediction.
And S7, inputting the joint representation vector into a multi-task learning framework to predict the risk index.
The method comprises the following steps of using a joint representation vector E generated in the step S6 as input of a multi-task learning framework, wherein the multi-task learning framework simultaneously predicts the following risk indexes, namely the volatility sigma 3,t、σ7,t、σ15,t and sigma 30,t of stock prices of target companies in 3 days, 7 days, 15 days and 30 days and a single-day risk value VAR;
constructing an independent first prediction sub-network MLP aiming at stock price volatility of each time span, wherein the prediction result of the stock price volatility is that Wherein the method comprises the steps ofF MLP,z (·) is the first predicted subnetwork MLP of the time span z;
For a single-day risk value VAR, an independent second prediction sub-network MLP is constructed, the input is a joint expression vector E, and the output is a prediction value of VAR Wherein the method comprises the steps ofF MLP,VAR (·) is the second prediction subnetwork MLP;
the joint loss function simultaneously optimizes stock price volatility and VAR prediction tasks, and the formula of the joint loss function is as follows: Mu is a weight superparameter for balancing stock price volatility prediction error and VAR prediction error, y j and A true value and a predicted value representing the stock price volatility index; Mean square error representing stock price volatility prediction, q represents quantile threshold of single day risk value, V and Respectively representing a real single-day risk value and a predicted single-day risk value; And (5) representing a quantile regression loss function of the single-day risk value prediction task.
Through the step S7, the multi-mode features can be effectively integrated by the multi-task learning framework, the fluctuation rate and the single-day risk value of different time spans can be accurately predicted, and reliable support is provided for financial risk management.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.