[go: up one dir, main page]

CN119203100A - A smart speaker permission design management method and system - Google Patents

A smart speaker permission design management method and system Download PDF

Info

Publication number
CN119203100A
CN119203100A CN202411217025.7A CN202411217025A CN119203100A CN 119203100 A CN119203100 A CN 119203100A CN 202411217025 A CN202411217025 A CN 202411217025A CN 119203100 A CN119203100 A CN 119203100A
Authority
CN
China
Prior art keywords
user
permission
authority
feature vector
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411217025.7A
Other languages
Chinese (zh)
Other versions
CN119203100B (en
Inventor
程佳能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ruigao Intelligent System Co ltd
Original Assignee
Guangzhou Ruigao Intelligent System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ruigao Intelligent System Co ltd filed Critical Guangzhou Ruigao Intelligent System Co ltd
Priority to CN202411217025.7A priority Critical patent/CN119203100B/en
Publication of CN119203100A publication Critical patent/CN119203100A/en
Application granted granted Critical
Publication of CN119203100B publication Critical patent/CN119203100B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/636Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Physiology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Automation & Control Theory (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an intelligent sound box authority design management method system, which comprises the steps of collecting user audio data and preprocessing the audio data, constructing a user-situation joint feature vector to carry out recognition accuracy adjustment, designing fuzzy matching and dynamic adjustment models to carry out real-time dynamic management on the authority of an intelligent sound box if complex voice environments of multiple languages and multiple accents appear, constructing a behavior monitoring and anomaly detection model to carry out real-time analysis on the user authority through feature engineering and time sequence modeling, dynamically adjusting the authority level according to the results of anomaly detection and behavior monitoring, providing a mapping function of emotion state and authority response to dynamically adjust the authority level, immediately applying new authority configuration to the intelligent sound box, constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization. The intelligent sound box meets the wide application requirements of the intelligent sound box in various scenes such as families, offices and the like.

Description

Intelligent sound box authority design management method and system
Technical Field
The invention belongs to the technical field of authority design management, and particularly relates to an intelligent sound box authority design management method and system.
Background
With the rapid development of internet of things (IoT) technology, smart speakers are becoming indispensable smart devices in home and office environments. The method provides a plurality of convenient services such as music playing, information inquiry, intelligent home control and the like for users in a voice interaction mode. However, the widespread use of smart speakers has also raised a number of problems associated with privacy protection, rights management, and multi-user collaboration. The existing intelligent sound box mainly controls the use permission of equipment through simple static permission setting, and the method has various limitations and disadvantages.
Firstly, the existing static authority management mode lacks of dynamics and intelligence, and cannot adapt to complex requirements of multiple users and multiple scenes. In a home environment, different members may have different usage requirements and rights requirements for the function of the sound box. For example, the rights of children and adults to content access should be different, and the existing static rights settings cannot flexibly cope with these changes, often resulting in children accessing unsuitable content or adults not having timely access to the desired functions. Second, conventional static rule-based rights management cannot provide a personalized experience, lacks deep understanding and learning of individual behaviors and usage habits of users, and results in poor user experience. The intelligent sound box is used as equipment with extremely high daily use frequency, and is supposed to have higher user experience optimizing capability.
In addition, with the popularization of intelligent sound boxes in home, office and other environments, the authority management of the intelligent sound boxes faces more serious privacy security challenges. The traditional authority setting mode generally requires a user to manually adjust the authority through tedious operations, and is not convenient and easy to make mistakes. Even more serious, enclosures are often exposed to public environments where unauthorized users may perform unauthorized operations through voice commands, presenting a significant security risk. For example, unauthorized users can easily access confidential information, control smart home devices, and even make improper voice purchases, resulting in property and privacy losses. In the prior art, solutions to these security problems are mostly post-hoc remedies, and a prospective protection strategy is lacking.
Finally, rights management of smart speakers lacks an effective management mechanism in a multi-user scenario, especially in a scenario where multiple people are simultaneously using, how to differentiate user identities and provide personalized services is a big challenge. The current rights management system can only distinguish simple user roles or rely on a manual login mode to switch users, which is not convenient and unreliable in actual use and cannot meet diversified use requirements. In the face of alternate use of users with different ages and different roles, the prior art cannot provide an intelligent and dynamic authority management scheme, which limits the function exertion and user experience optimization of the intelligent sound box.
Therefore, the current intelligent sound box authority management technology has obvious defects, which are mainly represented by the lack of dynamic intelligence, personalized experience, security guarantee and insufficient support for multi-user scenes. These problems severely limit the potential deployment and market applications of smart speakers.
Disclosure of Invention
The invention aims to design an intelligent sound box authority design management method and system, introduces an advanced artificial intelligent technology and a data analysis method, overcomes the defects of the prior art, and provides a more intelligent sound box authority management system with high safety and excellent user experience.
In order to achieve the above object, in a first aspect of the present invention, there is provided a method for rights design management of an intelligent sound box, the method comprising:
S1, collecting user audio data, preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data, and outputting an audio signature;
S2, taking the audio signature as input to represent the identity characteristic of the user, collecting real-time environment data, preprocessing, constructing a user-situation joint feature vector according to the identity characteristic of the user and the preprocessed real-time environment data, adding a regular term into the user-situation joint feature vector to carry out recognition accuracy adjustment, and then carrying out situation awareness and dynamic authority configuration initialization to obtain an optimized authority level and authority score;
S3, according to the optimized authority level and authority score, if complex voice environments of multiple languages and multiple accents appear, designing real-time dynamic management of the intelligent sound box authority by a fuzzy matching and dynamic adjusting model, wherein the fuzzy matching and dynamic adjusting model calculates the credibility score gamma of the voice feature vector C new by using a feature weighted aggregation method, and the credibility score gamma is expressed as follows:
Wherein, gamma represents the credibility score of the voice input, represents the credibility of the matching degree of the current voice feature, M represents the total number of voice features, w j represents the weight of the jth feature, represents the importance of the feature in calculating the voice credibility, alpha j represents the fuzzy adjustment parameter of the jth feature, controls the sensitivity of feature matching, C new,j represents the component of the jth voice feature vector, mu j represents the expected value of the jth feature, and represents the typical value of the feature under normal condition;
Confidence score gamma and optimized authority score using speech input Adjusting functions according to rightsDynamically adjusting authority level of userThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment, and representing the authority setting after adjustment according to the voice credibility and the original authority score; Representing the initial authority level output by the second step; η represents a rights adjustment gain coefficient, controls the rights adjustment amplitude, γ represents a confidence score for the speech input, δ represents an intermediate threshold for the confidence score; Represents the optimized authority score, tau represents the reference threshold of the authority score, and is calculated according to the new authority level The intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions;
s4, according to the new authority level after being optimized by combining the complex voice environment And then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring results
S5, according to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box immediately applies new authority configuration;
S6, according to the authority level Constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
Preferably, the preprocessing is to process the collected audio data x (t) to remove noise and to perform normalization processing;
Wherein, the probability density function of the Gaussian mixture model is expressed as:
Wherein, C represents the extracted Mel frequency cepstrum coefficient feature vector, which is used for representing the voiceprint feature of the audio; represents the parameter set of the Gaussian mixture model, pi k represents the mixing weight of the kth Gaussian component, meets the requirement And pi k≥0;μk represents the mean vector of the kth Gaussian component, Σ k represents the covariance matrix of the kth Gaussian component, K represents the number of Gaussian components and the number of Gaussian distributions in the model; Representing a multivariate normal distribution, expressed as:
where d represents the dimension of the feature vector C.
Preferably, for each new audio input, the user identity is identified by computing a log-likelihood estimate of the feature vector of the new audio under each user model, as follows:
Wherein logL (C new|Si) represents a log-likelihood value of the new feature vector under the ith user model for measuring the matching degree of the audio input and the user model, C new represents the feature vector of the new audio input, S i represents the audio signature of the ith user, namely GMM parameter set Θ i, T represents the total number of frames of the new audio input, represents the number of frames after the audio signal is segmented, and P (C new(t)|Θi) represents the probability of the feature vector C new (T) of the T frame under the user model Θ i.
Preferably, the real-time environment data is collected through various sensors, wherein the sensors comprise a microphone, a camera, a light sensor, a temperature and humidity sensor and an accelerometer, and a real-time environment feature vector is expressed as E;
The user-situation joint feature vectors are fused by a weighted multi-layer perceptron model, expressed as follows:
Ce=σ(W2·σ(W1·[Enorm,Si]+b1)+b2)
The method comprises the steps of generating a context feature vector, wherein C e represents the generated context feature vector, represents the behavior feature of a user in the current environment, [ E norm,Si ] represents the input feature obtained by combining the environment feature vector and a user audio signature, W 1,W2 represents a weight matrix, the dimensions are m (n+1) and m×m respectively, and the weight matrix is learned through model training, b 1,b2 represents a bias vector, the dimensions are m and m respectively, and the bias vector is learned through model training, and sigma represents a nonlinear activation function and is used for capturing nonlinear relations;
Combining the context feature vector C e and the user audio signature S i to generate a user-context joint feature vector f= [ S i,Ce ];
adding a regularization term to the user-context joint feature vector f= [ S i,Ce ] achieves higher recognition accuracy by a weighted combination of user and environmental features, as follows:
Fopt=F+λ·(F⊙Wreg)
Wherein F opt represents the optimized joint feature vector, lambda represents the regularization parameter for balancing the influence of the original feature and the regularization term, as well as the element product operation for element-by-element weighting, W reg represents the weight matrix and the interrelationship between the features.
Preferably, the configuration initialization includes:
designing a permission scoring function according to the adjusted user-situation joint feature vector for calculating the permission score of the user under the current situation The expression is as follows:
Wherein, Representing a permission score for reflecting a permission level of a user in a current context; beta represents a weight adjustment coefficient to control the overall weight of the joint feature; Representing the two norms of the joint features as a measure of feature importance, increasing the constraint of balance between features; Representing a bias term;
Scoring according to the calculated rights And a preset authority level threshold value, performing authority dynamic adjustment, wherein the authority adjustment functionThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment; And expressing the authority threshold, and setting according to the user requirements and system configuration.
Preferably, in S3, the fuzzy matching and dynamic adjustment model designs an error correction feedback mechanism to dynamically adjust the weights and parameters of the fuzzy matching model, the error correction mechanism is expressed as follows:
Δw=∈·(γtarget-γ)·Cnew
Wherein Deltaw represents a weight adjustment vector for modifying the weight of a speech feature, E represents a learning rate for controlling the step size of the weight adjustment, gamma target represents a target confidence score, typically set to a system desired confidence level, gamma represents a confidence score for the current speech input, and C new represents a feature vector for the current speech input.
Preferably, a monitoring and abnormality detection model based on a long-short-time memory network is constructed according to the weighted behavior feature vector B w, behavior features at the next moment are predicted, and an abnormality mode of the behavior is identified, wherein an update formula of the behavior monitoring and abnormality detection model is expressed as follows:
ht=fLSTM(Bw,t,ht-1)
Wherein h t represents the hidden state of the current time t, captures the time dependence of the user behavior, B w,t represents the weighted behavior feature vector of the current time t, h t-1 represents the hidden state of the previous time t-1, and f LSTM represents the state update function of the long and short time memory network;
judging whether abnormal behavior exists or not by calculating residual errors between actual behavior characteristics and predicted behavior characteristics, wherein the abnormal score xi is expressed as follows:
Wherein, xi represents the abnormal score, the difference degree between the current behavior and the predicted behavior, and B w,t represents the actual weighted behavior characteristic vector of the current time t; Sigma 2 represents the variance of the prediction residual and is used for normalizing the abnormality score;
Judging whether to trigger a permission adjustment or warning mechanism according to the anomaly score xi and a preset anomaly threshold value theta, wherein the adjustment strategy is as follows:
If xi is less than or equal to theta, the current authority level is maintained without adjustment
If xi > theta, identifying the abnormal behavior, and carrying out authority tightening or prompting the user to carry out identity verification according to the severity of the abnormality.
Preferably, the collection of the emotional states of the user includes collection of audio and video features, and then a fusion model based on a self-attention mechanism is used to capture interaction between the multi-modal features, and the specific formula is as follows:
e represents a fused emotion feature vector and is used for describing the current comprehensive emotion state of the user; A represents an audio feature vector, the dimension is m, V represents a video feature vector, the dimension is n, W A and W V represent projection matrices for mapping audio and video features to the same space, the dimensions are m x d k and n x d k;dk represent the dimensions of key vectors in an attention mechanism for scaling dot product operations, and softmax represents a normalization function for calculating attention weights;
classifying the emotion states by using the fused emotion feature vector E through an emotion classifier model C (E), and generating an emotion label C, wherein the emotion label C is expressed as follows:
w i represents a weight vector of the classifier model, the weight corresponding to the ith class, and the dimension d k;bi represents a bias item of the classifier model, and the bias corresponding to the ith class;
according to the emotion state label c and the abnormal behavior response strategy, a mapping function of emotion state and authority response is provided For dynamically adjusting authority levelsThe expression is as follows:
Wherein, And rho (c) represents a permission adjustment increment function, and is expressed as an integer which can be positive or negative depending on the user emotion state c.
Preferably, the self-adaptive authority optimization model based on reinforcement learning combines the current authority level and emotion state of the user, and the historical behavior mode and system feedback of the user, namely a state vectorWherein H represents a historical behavior pattern feature vector of a user, F represents a system feedback feature vector, and defines actions executable by the system as an adjustment operation set of authoritiesWherein each action a i represents an adjustment to the level of authority, a sense-optimized objective function J (θ) that maximizes the desired jackpot by optimizing the parameter θ of policy pi θ (S);
the formula for optimizing the objective function J (θ) is as follows:
wherein J (theta) represents an optimized objective function of a strategy parameter theta, and represents a desired jackpot, S represents a current state vector and consists of a permission level, an emotion state, a historical behavior mode and system feedback; Representing a state distribution set, representing a possible state space, wherein gamma represents a discount factor, the value range of which is more than or equal to 0 and less than or equal to 1 and is used for measuring the importance of future rewards, r (S t,at) represents instant rewards obtained when a state S t executes an action a t and reflects the validity and rationality of authority adjustment, a t represents the action executed at a time t and is determined by a strategy pi θ (S);
in the strategy optimization process, a strategy regularization term omega (theta) is introduced to avoid the strategy from excessively fitting the short-term behavior pattern of the user, and an optimization target for maximizing regularization of strategy parameters theta is updated, wherein the optimization target is expressed as follows:
Wherein, theta t+1 represents the updated strategy parameter, theta t represents the strategy parameter at the current time t, alpha represents the learning rate and controls the step length of parameter updating; The method comprises the steps of representing strategy gradients, representing gradients of strategy parameters of an optimized objective function, lambda representing regularization strength parameters and controlling influence degree of regularization items on strategy updating, and omega (theta t) representing strategy regularization items, wherein the strategy regularization items are sparsity regularization, and simplicity and generalization capability of strategy parameters are encouraged, and are represented as follows:
where N represents the total number of policy parameters and θ i represents the ith parameter of the policy parameters.
In a second aspect of the present invention, there is provided an intelligent sound box authority design management system, the system comprising:
The user data collection module is used for collecting user audio data, preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data and outputting an audio signature;
The configuration initialization module is used for taking the audio signature as input to represent the identity characteristic of the user, collecting real-time environment data, preprocessing, constructing a user-situation joint feature vector according to the identity characteristic of the user and the preprocessed real-time environment data, adding a regular term into the user-situation joint feature vector to adjust the identification precision, and then carrying out situation awareness and dynamic authority configuration initialization to obtain an optimized authority level and authority score;
The authority initializing module is used for designing real-time dynamic management of the authority of the intelligent sound box by the fuzzy matching and dynamic adjusting model according to the optimized authority level and the authority score if complex voice environments of multiple languages and multiple accents appear, wherein the fuzzy matching and dynamic adjusting model calculates the credibility score gamma of the voice feature vector C new by using a feature weighting aggregation method and is expressed as follows:
Wherein, gamma represents the credibility score of the voice input, represents the credibility of the matching degree of the current voice feature, M represents the total number of voice features, w j represents the weight of the jth feature, represents the importance of the feature in calculating the voice credibility, alpha j represents the fuzzy adjustment parameter of the jth feature, controls the sensitivity of feature matching, C new,j represents the component of the jth voice feature vector, mu j represents the expected value of the jth feature, and represents the typical value of the feature under normal condition;
Confidence score gamma and optimized authority score using speech input Adjusting functions according to rightsDynamically adjusting authority level of userThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment, and representing the authority setting after adjustment according to the voice credibility and the original authority score; Representing the initial authority level output by the second step; η represents a rights adjustment gain coefficient, controls the rights adjustment amplitude, γ represents a confidence score for the speech input, δ represents an intermediate threshold for the confidence score; Represents the optimized authority score, tau represents the reference threshold of the authority score, and is calculated according to the new authority level The intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions;
the authority management module is used for optimizing the new authority level according to the combined complex voice environment And then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring resultsAccording to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box immediately applies new authority configuration;
a system optimization module for optimizing the authority level Constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
The beneficial technical effects of the invention are at least as follows:
Firstly, the invention adopts the user audio signature recognition technology, and can recognize the user identity in real time through the audio characteristics in the using process of the loudspeaker box. The method not only can effectively distinguish different users in a home or office environment, but also can automatically adjust the authority setting according to the use habit and the authority requirement of each user, thereby realizing real personalized service and overcoming the problem of lack of flexibility in the traditional static authority setting.
Secondly, the invention introduces a context-aware authority management technology, and the system can intelligently perceive the current context and dynamically adjust the authority setting by combining the behavior data of the user and the environment information (such as time, place, schedule and the like) through multi-sensor data fusion such as a microphone, a camera, a light sensor and the like. For example, in the evening home mode, the system can automatically start the child protection mode and limit unsuitable function access, so that the intelligence and user experience of the system are greatly improved, and the problem of lack of dynamic adjustment capability in the prior art is solved.
The invention designs a self-adaptive voice fuzzy matching algorithm, which can self-adaptively adjust voice recognition precision under various accents and speech speeds, and ensures that the permission setting is flexibly adjusted to ensure safety while accurately understanding the intention of a user. The self-adaptive mechanism solves the problem of low recognition accuracy in the multi-language and multi-accent use scene in the prior art, and reduces the risk of misoperation of rights.
In addition, the invention also integrates an abnormal authority behavior detection module and an emotion reasoning authority management module. The abnormal authority behavior detection module monitors and analyzes the authority use behavior of the user in real time by using a machine learning algorithm, can rapidly identify abnormal operation and take corresponding safety measures, and enhances the safety and the protection capability of the system. The emotion reasoning authority management module analyzes the emotion state of the user through voice, automatically adjusts the authority when the emotion of the user is excited, and prevents improper behaviors caused by emotion fluctuation. The innovation points effectively solve the defect that the prior art lacks response to the emotion and abnormal behavior of the user, and remarkably improve the safety and humanized experience of the system.
Through the innovation points, the invention provides a comprehensive solution, can effectively overcome various defects in the prior art, and realizes dynamic management, personalized service, safety protection and multi-user support of the authority of the intelligent sound box. The multi-layer and omnibearing intelligent authority management system not only improves user experience, but also obviously enhances the safety and adaptability of equipment, and meets the wide application requirements of intelligent sound boxes in diversified scenes such as families, offices and the like.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
Fig. 1 is a flowchart of a method for rights design management of an intelligent sound box according to an embodiment of the invention.
Fig. 2 is a frame diagram of an intelligent sound box authority design management system according to an embodiment of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In one or more embodiments, as shown in fig. 1, the invention discloses a method for managing authority design of an intelligent sound box, which comprises steps 1-6, including:
s1, collecting user audio data, preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data, and outputting an audio signature.
Specifically, during preprocessing, the acquired audio data x (t) is processed to remove noise and normalized.
Where y (t) represents the denoised audio signal. x (t) represents the original audio signal as a function of time.Representing the estimated noise signal, is extracted from the environment using an adaptive filter.
Signal normalization process, the normalized signal is expressed as:
Where y norm (t) represents the normalized audio signal. Mu (y) represents the mean value of the denoising signal y (t), calculated as
Where N is the number of signal samples. Sigma (y) represents the standard deviation of the denoising signal y (t), calculated as
Further, the audio features of the user are modeled using a gaussian mixture model to form an audio signature. The probability density function of the GMM is expressed as:
where C represents an extracted mel-frequency cepstral coefficient (MFCC) feature vector for representing voiceprint features of audio. A set of parameters representing the GMM. Pi k represents the mixing weight of the kth Gaussian component, satisfying And pi k≥0.μk represents the mean vector of the kth gaussian component. Σ k represents the covariance matrix of the kth gaussian component. K represents the number of Gaussian components and the number of Gaussian distributions in the model.Representing a multivariate normal distribution, expressed as:
Where d is the dimension of the feature vector C.
Further, for each new audio input, the user identity is identified by computing a log-likelihood estimate of its feature vector under each user model:
Wherein logL (C new|Si) represents the log-likelihood value of the new feature vector under the ith user model for measuring the degree of matching of the audio input to the user model. C new denotes the feature vector of the new audio input. S i represents the audio signature of the ith user, i.e., GMM parameter set Θ i. T represents the total number of frames of the new audio input, representing the number of frames after segmentation of the audio signal. P (C new(t)|Θi) represents the probability that the feature vector C new (t) of the t-th frame is under the user model Θ i.
S2, taking the audio signature as input to represent the identity characteristic of the user, collecting real-time environment data, preprocessing, constructing a user-situation joint feature vector according to the identity characteristic of the user and the preprocessed real-time environment data, adding a regular term into the user-situation joint feature vector to carry out recognition accuracy adjustment, and then carrying out situation awareness and dynamic authority configuration initialization to obtain the optimized authority level and authority score.
The intelligent sound box collects environmental data in real time through various sensors (such as a microphone, a camera, a light sensor, a temperature and humidity sensor, an accelerometer and the like) to form an environmental characteristic vector E= [ E 1,e2,…,en ]. These environmental features include:
e 1 represents an ambient noise level, ranging from 0 to 100 db.
E 2 represents the illumination intensity, ranging from 0 to 10,000 lux.
E 3 denotes the distance of the user from the loudspeaker in the range of 0 to 5 meters.
E 4 represents the current time, 24 hours, ranging from 0 to 23.99.
E 5 denotes the number of detected persons, and the integer range is 0 to 10.
E 6 denotes the user activity state, classifies the variables, and takes the value range { stationary, walking, jumping, etc }.
Other sensor data such as temperature e 7, humidity e 8, etc.
Further, since the data ranges and units of different sensors are different, the environmental feature vector E needs to be standardized to unify the data scales:
Wherein, Representing the normalized environmental characteristics. min (e i) and max (e i) represent the minimum and maximum values of the environmental feature e i for normalization processing.
Further, the user identity and the environmental characteristics are fused by using the user audio signature S i obtained in the first step, and a novel fusion method is introduced, so that the user identity characteristics and the environmental data are fused, and a situation characteristic vector C e is constructed. The specific method is that a weighted multi-layer perceptron (MLP) model is used, and the model can dynamically adjust the weights of the user characteristics and the environment characteristics in the learning process so as to reflect the behavior modes of the user in different environments.
Ce=σ(W2·σ(W1·[Enorm,Si]+b1)+b2)
Wherein C e represents the generated contextual feature vector, representing the behavioral characteristics of the user in the current environment. [ E norm,Si ] represents the input feature after combining the ambient feature vector and the user audio signature. W 1,W2 represents a weight matrix, the dimensions are m× (n+1) and m×m, respectively, and learning is performed through model training. b 1,b2 represents bias vectors, the dimensions are m and m respectively, and learning is performed through model training. σ represents a nonlinear activation function (e.g., reLU or Sigmoid) for capturing nonlinear relationships.
Further, a user-context joint feature vector f= [ S i,Ce ] is generated in combination with the context feature vector C e and the user audio signature S i. This is a fused feature vector that represents the comprehensive state of a particular user in the current context.
In order to better capture the complex relationship between the user features and the environmental features, a joint feature modeling method with innovative regularization terms is provided. This regularization term achieves higher recognition accuracy by a weighted combination of the user and environmental features. The model formula is as follows:
Fopt=F+λ·(F⊙Wreg)
Wherein F opt represents the optimized joint feature vector, taking into account the weighted relationship of the user and the environmental features. Lambda represents a regularization parameter for balancing the effects of the original features and regularization term, typically determined by cross-validation. As indicated by the letter, "(Hadamardproduct) is used for the element-wise weighting. W reg represents a weight matrix, representing the interrelationship between features, obtained by optimizing dynamic learning.
Further, based on the optimized joint feature vector F opt, a permission scoring function with an innovation extra term is provided for calculating the permission score of the user in the current situation
Wherein, And representing the permission score for reflecting the permission level of the user in the current context. w represents a weight vector, obtained through model training, and represents the importance of each feature. Beta represents a weight adjustment coefficient, controlling the overall weight of the joint feature. and/F opt2 represents the two norms of the joint features, which are taken as the measure of the importance of the features, and the constraint of balance among the features is increased. b represents the bias term, obtained by training data learning.
Scoring according to the calculated rightsAnd a preset authority level threshold value, and performing authority dynamic adjustment. Rights adjustment function Is defined as:
Wherein, Representing the dynamically adjusted permission level according to which the specific permission configuration is to be set. T 1,T2 represents a permission threshold, and is set according to user requirements and system configuration.
The finally determined authority levelThe method is applied to an intelligent sound box system and can dynamically control functions and data authority which can be accessed by a user.
And S3, according to the optimized authority level and authority score, if complex speech environments of multiple languages and multiple accents appear, designing fuzzy matching and dynamic regulation models to dynamically manage the authority of the intelligent sound box in real time.
Specifically, when a user sends a voice command, the sound box collects a current voice input signal x (t), and performs preprocessing (such as denoising and standardization) on the voice signal by using the method in the first step to generate a processed voice feature vector C new.
Meanwhile, in order to adapt to a multi-language and multi-accent environment, a self-adaptive voice fuzzy matching model is provided. The model uses a feature weighted aggregation method to calculate the credibility score gamma of the voice feature vector C new, and the formula is as follows:
Where γ represents the confidence score of the speech input, and represents the confidence of the matching degree of the current speech feature. M represents the total number of speech features. w j represents the weight of the jth feature, indicating the importance of that feature in calculating speech confidence. Alpha j denotes the fuzzy tuning parameter of the j-th feature, controlling the sensitivity of feature matching. C new,j denotes a component of the jth speech feature vector. Mu j represents the expected value of the j-th feature, representing the typical value of that feature under normal conditions.
Further, defining a confidence-based entitlement adjustment policy utilizes a confidence score gamma of the voice input and the entitlement score in the second stepA rights regulating function is provided "Dynamically adjusting the permission level of a user
Wherein, And the authority level after dynamic adjustment is represented, and the authority setting after adjustment is carried out according to the voice credibility and the original authority score.Representing the initial permission level of the output of the second step. η represents a gain factor for entitlement adjustment, and the amplitude of entitlement adjustment is controlled. Gamma denotes the confidence score of the speech input. Delta represents an intermediate threshold of the confidence score, typically set to 0.5.Representing the rights score calculated in the second step. τ represents a baseline threshold for the entitlement score, typically set based on actual usage.
Further, according to the calculated new authority levelThe intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions.
In order to further improve the robustness and accuracy of the system, an "error correction feedback mechanism" is introduced. The mechanism dynamically adjusts the weight and parameters of the fuzzy matching model by detecting and analyzing user feedback or use behaviors after rights adjustment. The core formula of error correction is as follows:
Δw=∈·(γtarget-γ)·Cnew
Where Δw represents a weight adjustment vector for modifying the weight of the speech feature. E denotes the learning rate, the step size for controlling the weight adjustment. Gamma target represents the target confidence score, typically set to the desired confidence level for the system. Gamma denotes the confidence score of the current speech input. C new denotes the feature vector of the current speech input.
The parameters of the voice fuzzy matching model are updated according to the error correction formula, so that the authority configuration can be more accurately adjusted when the system faces to voice input of different users and contexts.
Further, the permission adjusting function is combined with the instant feedback and long-term use behavior data of the userAnd continuously optimizing the fuzzy matching model. The system parameters are dynamically adjusted by monitoring the operation habit and the authority use condition of the user in real time, so that the adaptability and the accuracy of the authority adjustment are improved.
And collecting the use data (such as the behavior, satisfaction feedback and the like of the user under the adjusted authority level) of the user, and periodically retraining the fuzzy matching model and the authority adjustment function to ensure that the model parameters are consistent with the dynamic changes of the user demands.
S4, according to the new authority level after being optimized by combining the complex voice environmentAnd then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring results
Specifically, extracting the user behavior features includes:
The intelligent sound box continuously collects operation behavior data of a user and generates a behavior feature vector B= [ B 1,b2,…,bk ], wherein the features comprise:
b 1 represents the type of voice instruction (such as playing music, setting alarm clock, etc.) issued by the user, and is represented as a classification variable.
B 2 denotes the time interval (unit: second) at which the instruction occurs, expressed as a continuous variable.
B 3 denotes the frequency of instructions (units: times/min) issued by the user during a specific time period, expressed as a continuous variable.
B 4 represents the authority level change amplitude corresponding to each instruction, and represents an integer.
B 5 represents the identified emotional state of the user, expressed as a classification variable (e.g., calm, excited, etc.).
And (3) normalizing and weighting the behavior feature vector B to eliminate scale difference among different features. Then, weighted combination is performed according to the feature importance, and a weighted behavior feature vector B w is generated:
Bw=WB·Bnorm
wherein B w represents a weighted behavior feature vector, which is represented as a behavior feature weighted by different weights. W B denotes a weight matrix for weighting of different features. B norm denotes the normalized behavior feature vector, expressed as normalized feature values.
Further, a behavior monitoring and anomaly detection model based on a long-short-time memory network (LSTM) is constructed using the weighted behavior feature vector B w for capturing the time dependence of the user behavior. The goal of the model is to predict the behavior characteristics at the next moment, thereby identifying the abnormal pattern of behavior. The updated formula of the LSTM model is:
ht=fLSTM(Bw,t,ht-1)
Wherein h t represents the hidden state at the current time t, capturing the time dependence of the user behavior. B w,t denotes the weighted behavior feature vector at the current time t. h t-1 denotes the hidden state at the previous time t-1. f LSTM represents the state update function of LSTM, defined as a conventional LSTM cell calculation formula.
And judging whether abnormal behaviors exist or not by calculating residual errors between the actual behavior characteristics and the predicted behavior characteristics. The anomaly score ζ is calculated by the following formula:
where ζ represents the anomaly score, representing the degree of difference between the current behavior and the predicted behavior. B w,t represents the actual weighted behavior feature vector at the current time t. And the weighted behavior feature vector represents the next moment of LSTM model prediction. σ 2 represents the variance of the prediction residual for normalizing the anomaly score.
Further, according to the anomaly score ζ and a preset anomaly threshold value θ, it is determined whether to trigger a permission adjustment or a warning mechanism. The adjustment strategy is as follows:
If xi is less than or equal to theta, the current authority level is maintained without adjustment
If xi > theta, identifying the abnormal behavior, and carrying out authority tightening or prompting the user to carry out identity verification according to the severity of the abnormality.
Further, defining authority dynamic adjustment function under abnormal behavior conditionDynamic adjustment of rights level based on anomaly score ζ
Wherein, And representing the authority level after dynamic adjustment, and considering the authority setting after abnormal behaviors.Representing the level of rights output by the third step. ζ represents the adjustment step size parameter, and the adjustment range of the control authority. ζ represents the current anomaly score. θ represents a threshold value for abnormality detection, and is generally set based on actual user behavior data.
Further, the time sequence behavior model and the abnormality detection mechanism are continuously optimized by combining feedback of the user and actual behavior data. Through online learning and periodic model updating, the self-adaptability of the model to the behavior change of the user is ensured. User behavior data (such as abnormal behavior times, authority adjustment frequency and the like) are collected, parameters of the LSTM model are updated periodically, and an abnormality detection threshold value theta and a weight matrix W B are adjusted to ensure that the system can stably run in a changeable user behavior environment.
S5, according to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box instantly applies new permission configuration.
Further, the user emotion state characteristics are extracted, and the intelligent sound box is combined with the microphone and the camera to collect the multi-mode emotion data of the user in real time, wherein the multi-mode emotion data comprises audio and video characteristics. The audio feature vector a= [ a 1,a2,…,am ] may include:
a 1 represents the voice pitch frequency variation, expressed as a continuous variable.
A 2 represents the speech rate variation, expressed as a continuous variable.
A 3 represents the intensity of the voice volume, expressed as a continuous variable.
Other audio features such as speech discontinuities, timbre, etc.
The video feature vector v= [ V 1,v2,…,vn ] may include:
v 1 denotes a facial expression feature (such as eye-horn up, mouth-horn down, etc.), expressed as a classification variable.
V 2 denotes an eye movement characteristic (such as eye movement speed, direction, etc.), expressed as a continuous variable.
V 3 denotes head pose (e.g., rotation angle, pitch angle, etc.), expressed as a continuous variable.
Further, the audio feature vector A and the video feature vector V are fused to generate a fused emotion feature vector E. A fusion model based on self-attention mechanism is used for capturing interaction between multi-modal features, and the specific formula is as follows:
E represents the fused emotion feature vector and is used for describing the current comprehensive emotion state of the user. A represents an audio feature vector, and the dimension is m. V denotes the video feature vector, with dimension n. W A and W V represent projection matrices for mapping audio and video features to the same space, dimensions m x d k and n x d k.dk represent the dimensions of key vectors in the attention mechanism for scaling dot product operations. softmax represents the normalization function used to calculate the attention weight.
Further, the fused emotion feature vector E is utilized to pass through an emotion classifier modelClassifying the emotion states to generate an emotion label c:
Where c represents the current user's emotional state label, expressed as discrete categories (e.g., calm, anger, happiness, etc.). w i represents the weight vector of the classifier model, the weight corresponding to the i-th class, the bias term of the classifier model with the dimension d k.bi, the bias corresponding to the i-th class and the scalar.
Further, according to the emotion state label c and the abnormal behavior response strategy, a mapping function of emotion state and authority response is providedFor dynamically adjusting authority levels
Wherein, And representing the authority level dynamically adjusted according to the emotion state, and considering authority setting of the emotion of the user.Representing the level of rights output by the fourth step. ρ (c) represents a permission adjustment delta function, expressed as an integer that can be positive or negative depending on the user's emotional state c.
Further, according to the emotion state and authority response mapping functionCalculated authority levelThe intelligent sound box immediately applies new permission configuration, and ensures safe and reasonable operation of the user in the current emotion state.
And continuously optimizing an emotion classification model and a permission response strategy by combining instant feedback of a user and long-time behavior pattern data. By monitoring the behavior and emotion change of the user in real time, the parameters of the emotion classifier are updated regularly, and the authority response function is adjusted And the emotion characteristic fusion model ensures that the model can adapt to the dynamic change of the emotion state of the user.
S6, according to the authority levelConstructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
Specifically, the state of the adaptive authority optimization model based on reinforcement learning is defined as the current authority level and emotion state of the user, and the historical behavior pattern and system feedback of the user, namely a state vectorWherein, And (3) representing the dynamically adjusted authority level output in the fifth step, and representing the current authority configuration optimized according to the emotion state of the user. c represents the current emotional state label of the user, expressed as discrete categories (e.g., calm, anger, happiness, etc.). H represents a historical behavior pattern feature vector of the user, representing statistical features of the user behavior over a period of time, such as operating frequency, preferred operating type, etc. F represents a system feedback feature vector, and represents user feedback data of the system under different authority configurations, such as user satisfaction degree scores, error operation frequencies and the like.
Defining system-executable actions as a set of rights adjustment operationsWherein each action a i represents an adjustment (e.g., raise, lower, hold, etc.) to the level of rights. The design of these actions needs to take into account the security of the rights, the user experience and the functional limitations of the smart speakers.
Further, to enable the system to maintain optimal rights configuration in varying user behavior and emotional states, we employ a policy-optimized reinforcement learning approach. An optimization objective function J (θ) is defined to maximize the desired jackpot by optimizing the parameter θ of the strategy pi θ (S). The formula for optimizing the objective function J (θ) is as follows:
where J (θ) represents an optimized objective function of the policy parameter θ, representing the desired jackpot. S represents a current state vector and consists of a permission level, an emotion state, a historical behavior mode and system feedback. Representing a set of state distributions, representing possible state spaces. Gamma represents a discount factor, and the value range is more than or equal to 0 and less than or equal to 1, so that the importance of future rewards is measured. r (S t,at) represents an instant prize obtained when the action a t is performed in the state S t, reflecting the validity and rationality of the rights adjustment. a t represents the action performed at time t, as determined by policy pi θ (S).
In the policy optimization process, to avoid the policy from excessively fitting the short-term behavior pattern of the user, we introduce a novel policy regularization term Ω (θ) and update the policy parameters θ to maximize the regularized optimization objective:
Where θ t+1 represents the updated policy parameters. θ t represents the policy parameter at the current time t. Alpha represents the learning rate and controls the step size of parameter updating. Representing the policy gradient and representing the gradient of the policy parameters that optimize the objective function. λ represents a regularization strength parameter, controlling the extent to which regularization terms affect policy updates. Omega (theta t) represents a strategy regularization term, is designed to be sparsity regularization, encourages conciseness and generalization capability of strategy parameters, and has the following formula:
where N represents the total number of policy parameters. θ i represents the ith parameter among the policy parameters.
Further, the design of the instant rewarding function r (S t,at) combines the reasonability of user behavior feedback, emotion state change and authority adjustment, and aims to balance the security and user experience of the system. The formula of the reward function is as follows:
Where r (S t,at) represents the instant prize, measuring the effect of executing action a t in state S t. U t represents the amount of change in the user satisfaction score at the current time t, and the function f (U t) represents a positive reward, positively correlated with a reasonable entitlement adjustment. E t represents the amount of change in the user's emotional state fluctuation, and function g (E t) represents a negative reward that is inversely related to the negative emotional response due to the rights adjustment. Representing the magnitude, function of rights adjustmentIndicating a negative prize and punishing excessively frequent entitlement adjustment operations. Omega 123 represents a weight coefficient and controls the influence of various factors on instant rewards.
To further improve the adaptability and generalization ability of the model, we introduced an adaptive learning mechanism based on user behavior patterns. The mechanism dynamically adjusts the learning rate and regularization parameters of the strategy model by continuously monitoring the behavior characteristics and the emotion state changes of the user so as to adapt to the personalized requirements of the user. Learning rate and regularization parameter dynamic adjustment formula:
αt+1=αt·(1-η·|ΔHt|)
λt+1=λt·(1+ζ·|ΔFt|)
Where α t+1 denotes the learning rate at the next time. Alpha t represents the learning rate at the current time. η represents a learning rate adjustment coefficient, and the adjustment range of the learning rate is controlled. The |Δh t | represents the variation amplitude of the user behavior pattern feature vector, and represents the variation degree of the user behavior. Lambda t+1 represents the regularized intensity parameter at the next time instant. Lambda t represents the regularized strength parameter at the current time. ζ represents a regularization parameter adjustment coefficient, and an adjustment amplitude of the regularization intensity is controlled. The |Δf t | represents the variation amplitude of the system feedback feature vector, and represents the variation degree of the user feedback.
Further, a policy optimization and feedback loop is established in combination with the long-term behavioral data and immediate feedback of the user. And the optimal performance of the system in long-term operation is ensured by continuously monitoring the performance of the system under different authority configurations and periodically updating parameters of the strategy model.
And collecting long-term behavior data (such as authority adjustment frequency, user satisfaction change, emotion state fluctuation and the like) of the user, and periodically updating parameters of the reinforcement learning model, including strategy parameters theta, regularization parameters lambda and learning rate alpha, so as to ensure that the model can adapt to long-term changes of the user behavior and emotion state.
The embodiment of the application also provides an intelligent sound box authority design management system, as shown in fig. 2, which comprises:
the user data collection module 101 is used for collecting user audio data and preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data and outputting an audio signature;
The configuration initialization module 102 is configured to take the audio signature as input to represent the identity feature of the user, collect real-time environment data and perform preprocessing, construct a user-context joint feature vector according to the identity feature of the user in combination with the preprocessed real-time environment data, add a regular term into the user-context joint feature vector to perform recognition accuracy adjustment, and then perform context awareness and dynamic authority configuration initialization to obtain an optimized authority level and authority score;
The authority initialization module 103 is configured to design real-time dynamic management of the authority of the intelligent sound box by using a fuzzy matching and dynamic adjustment model according to the optimized authority level and authority score if complex speech environments with multiple languages and multiple accents appear, where the fuzzy matching and dynamic adjustment model calculates a confidence score γ of a speech feature vector C new by using a feature weighted aggregation method, and the confidence score γ is expressed as follows:
Wherein, gamma represents the credibility score of the voice input, represents the credibility of the matching degree of the current voice feature, M represents the total number of voice features, w j represents the weight of the jth feature, represents the importance of the feature in calculating the voice credibility, alpha j represents the fuzzy adjustment parameter of the jth feature, controls the sensitivity of feature matching, C new,j represents the component of the jth voice feature vector, mu j represents the expected value of the jth feature, and represents the typical value of the feature under normal condition;
Confidence score gamma and optimized authority score using speech input Adjusting functions according to rightsDynamically adjusting authority level of userThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment, and representing the authority setting after adjustment according to the voice credibility and the original authority score; Representing the initial authority level output by the second step; η represents a rights adjustment gain coefficient, controls the rights adjustment amplitude, γ represents a confidence score for the speech input, δ represents an intermediate threshold for the confidence score; Represents the optimized authority score, tau represents the reference threshold of the authority score, and is calculated according to the new authority level The intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions;
rights management module 104 for optimizing new rights levels based on a combination of complex speech environments And then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring resultsAccording to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box immediately applies new authority configuration;
A system optimization module 105 for optimizing the authority level Constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the invention, as it is understood by those skilled in the art that all or part of the above-described embodiments may be practiced without resorting to the equivalent thereof, which is intended to fall within the scope of the invention as defined by the appended claims.

Claims (10)

1.一种智能音箱权限设计管理方法,其特征在于,所述方法包括:1. A method for designing and managing permissions for a smart speaker, characterized in that the method comprises: S1、收集用户音频数据并对音频数据进行预处理,根据预处理后的数据构建高斯混合模型输出音频签名;S1, collect user audio data and preprocess the audio data, build a Gaussian mixture model based on the preprocessed data to output the audio signature; S2、将音频签名作为输入代表了用户的身份特征,采集实时环境数据并进行预处理,根据用户的身份特征结合预处理后的实时环境数据构建用户-情境联合特征向量并在用户-情境联合特征向量加入正则项进行识别精度调整,然后进行情境感知与动态权限配置初始化,得到优化后的权限级别和权限评分;S2. The audio signature is used as input to represent the identity characteristics of the user. The real-time environment data is collected and preprocessed. A user-context joint feature vector is constructed based on the user's identity characteristics and the preprocessed real-time environment data. A regularization term is added to the user-context joint feature vector to adjust the recognition accuracy. Then, context awareness and dynamic permission configuration initialization are performed to obtain the optimized permission level and permission score. S3、根据优化后的权限级别和权限评分,若出现多种语言、多种口音的复杂语音环境,则设计模糊匹配和动态调节模型对智能音箱权限的实时动态管理;其中,所述模糊匹配和动态调节模型使用特征加权聚合方法计算语音特征向量Cnew的可信度得分γ,表示如下:S3. According to the optimized permission level and permission score, if a complex voice environment with multiple languages and multiple accents appears, a fuzzy matching and dynamic adjustment model is designed to dynamically manage the permissions of the smart speaker in real time; wherein, the fuzzy matching and dynamic adjustment model uses a feature weighted aggregation method to calculate the credibility score γ of the voice feature vector C new , which is expressed as follows: 其中,γ表示语音输入的可信度得分,表示当前语音特征匹配度的可信性;M表示语音特征的总数;wj表示第j个特征的权重,表示该特征在计算语音可信度时的重要性;αj表示第j个特征的模糊调整参数,控制特征匹配的敏感度;Cnew,j表示第j个语音特征向量的分量;μj表示第j个特征的期望值,表示该特征在正常情况下的典型值;Wherein, γ represents the credibility score of the speech input, indicating the credibility of the current speech feature matching; M represents the total number of speech features; wj represents the weight of the jth feature, indicating the importance of the feature in calculating the speech credibility; αj represents the fuzzy adjustment parameter of the jth feature, controlling the sensitivity of feature matching; Cnew ,j represents the component of the jth speech feature vector; μj represents the expected value of the jth feature, indicating the typical value of the feature under normal circumstances; 利用语音输入的可信度得分γ和优化后的权限评分根据权限调节函数动态调整用户的权限级别表示如下:Using the credibility score γ of voice input and the optimized authority score Adjusting functions based on permissions Dynamically adjust user permission levels It is expressed as follows: 其中,表示动态调整后的权限级别,表示根据语音可信度和原始权限评分进行调节后的权限设置;表示第二步骤输出的初始权限级别;η表示权限调整增益系数,控制权限调整幅度;γ表示语音输入的可信度得分;δ表示可信度得分的中间阈值;表示优化的权限评分;τ表示权限评分的基准阈值;根据计算出的新权限级别智能音箱系统即时调整用户的权限配置,动态改变功能的可访问性;in, Indicates the dynamically adjusted permission level, indicating the permission setting adjusted according to the voice credibility and the original permission score; represents the initial authority level output in the second step; η represents the authority adjustment gain coefficient, which controls the authority adjustment range; γ represents the credibility score of the speech input; δ represents the intermediate threshold of the credibility score; represents the optimized permission score; τ represents the baseline threshold of the permission score; based on the calculated new permission level The smart speaker system instantly adjusts the user's permission configuration and dynamically changes the accessibility of functions; S4、根据结合复杂语音环境优化后的新权限级别再结合实时用户行为数据,构建行为监控与异常检测模型通过特征工程和时序建模来实时分析用户权限,对用户行为进行异常检测和行为监控,并根据异常检测和行为监控的结果动态调整权限级别 S4. New permission level optimized based on complex voice environment Combined with real-time user behavior data, a behavior monitoring and anomaly detection model is built to analyze user permissions in real time through feature engineering and time series modeling, perform anomaly detection and behavior monitoring on user behavior, and dynamically adjust permission levels based on the results of anomaly detection and behavior monitoring. S5、根据权限级别结合用户的情感状态通过情感分类器模型进行情感状态的分类,生成情感标签,根据情感状态标签和异常行为响应策略,提出情感状态与权限响应的映射函数动态调整权限级别,根据情感状态与权限响应映射函数计算得到的权限级别智能音箱即时应用新的权限配置;S5. According to the permission level Combined with the user's emotional state, the emotional state is classified through the emotion classifier model to generate emotional labels. According to the emotional state labels and abnormal behavior response strategies, a mapping function between emotional state and permission response is proposed to dynamically adjust the permission level. The permission level calculated by the mapping function between emotional state and permission response is Smart speakers instantly apply new permission configurations; S6、根据权限级别构建“基于强化学习的自适应权限优化模型,结合用户长期的行为和使用模式,通过强化学习对智能音箱的权限策略进行自适应学习进行长期策略优化。S6. According to the authority level Build an adaptive permission optimization model based on reinforcement learning, combine users' long-term behavior and usage patterns, and use reinforcement learning to adaptively learn the permission strategy of smart speakers for long-term strategy optimization. 2.根据权利要求1所述的一种智能音箱权限设计管理方法,其特征在于,所述预处理是将采集到的音频数据x(t)进行处理去除噪声和进行标准化处理;2. A smart speaker permission design management method according to claim 1, characterized in that the preprocessing is to process the collected audio data x(t) to remove noise and perform standardization processing; 其中,所述高斯混合模型的概率密度函数表示为:Wherein, the probability density function of the Gaussian mixture model is expressed as: 其中,C表示提取的梅尔频率倒谱系数特征向量,用于表示音频的声纹特征; 表示高斯混合模型的参数集合;πk表示第k个高斯成分的混合权重,满足且πk≥0;μk表示第k个高斯成分的均值向量;Σk表示第k个高斯成分的协方差矩阵;K表示高斯成分的数量,表示模型中高斯分布的个数;表示多元正态分布,表示为:Wherein, C represents the extracted Mel frequency cepstral coefficient feature vector, which is used to represent the voiceprint feature of the audio; represents the parameter set of the Gaussian mixture model; π k represents the mixing weight of the kth Gaussian component, satisfying And π k ≥ 0; μ k represents the mean vector of the kth Gaussian component; Σ k represents the covariance matrix of the kth Gaussian component; K represents the number of Gaussian components, indicating the number of Gaussian distributions in the model; represents the multivariate normal distribution, expressed as: 其中,d表示特征向量C的维度。Where d represents the dimension of the feature vector C. 3.根据权利要求2所述的一种智能音箱权限设计管理方法,其特征在于,对每个新音频输入,通过计算新音频的特征向量在每个用户模型下的对数似然估计识别用户身份,表示如下:3. A smart speaker permission design management method according to claim 2, characterized in that for each new audio input, the user identity is identified by calculating the log-likelihood estimate of the feature vector of the new audio under each user model, which is expressed as follows: 其中,logL(Cnew|Si)表示新特征向量在第i个用户模型下的对数似然值,用于度量音频输入与用户模型的匹配度;Cnew表示新音频输入的特征向量;Si表示第i个用户的音频签名,即GMM参数集合Θi;T表示新音频输入的帧总数,表示音频信号分割后的帧的数量;P(Cnew(t)|Θi)表示第t帧的特征向量Cnew(t)在用户模型Θi下的概率。Among them, logL(C new |S i ) represents the logarithmic likelihood value of the new feature vector under the i-th user model, which is used to measure the matching degree between the audio input and the user model; C new represents the feature vector of the new audio input; S i represents the audio signature of the i-th user, that is, the GMM parameter set Θ i ; T represents the total number of frames of the new audio input, which represents the number of frames after the audio signal is segmented; P(C new (t)|Θ i ) represents the probability of the feature vector C new (t) of the t-th frame under the user model Θ i . 4.根据权利要求1所述的一种智能音箱权限设计管理方法,其特征在于,所述实时环境数据包括通过多种传感器进行采集,所述传感器包括麦克风、摄像头、光传感器、温湿度传感器和加速度计,实时环境特征向量表示为E;4. A smart speaker permission design management method according to claim 1, characterized in that the real-time environmental data is collected through a variety of sensors, the sensors include microphones, cameras, light sensors, temperature and humidity sensors and accelerometers, and the real-time environmental feature vector is represented by E; 所述用户-情境联合特征向量通过加权多层感知器模型进行融合,表示如下:The user-context joint feature vector is fused through a weighted multilayer perceptron model, which is expressed as follows: Ce=σ(W2·σ(W1·[Enorm,Si]+b1)+b2)C e =σ(W 2 ·σ(W 1 ·[E norm ,S i ]+b 1 )+b 2 ) 其中,Ce表示生成的情境特征向量,表示用户在当前环境下的行为特征;[Enorm,Si]表示将环境特征向量和用户音频签名合并后的输入特征;W1,W2表示权重矩阵,维度分别为m×(n+1)和m×m,通过模型训练学习;b1,b2表示偏置向量,维度分别为m和m,通过模型训练学习;σ表示非线性激活函数,用于捕捉非线性关系;Among them, Ce represents the generated context feature vector, which represents the user's behavior characteristics in the current environment; [E norm ,S i ] represents the input feature after merging the environment feature vector and the user audio signature; W 1 ,W 2 represent weight matrices, with dimensions of m×(n+1) and m×m respectively, which are learned through model training; b 1 ,b 2 represent bias vectors, with dimensions of m and m respectively, which are learned through model training; σ represents a nonlinear activation function, which is used to capture nonlinear relationships; 结合情境特征向量Ce和用户音频签名Si,生成用户-情境联合特征向量F=[Si,Ce];Combine the context feature vector Ce and the user audio signature Si to generate a user-context joint feature vector F = [ Si , Ce ]; 在用户-情境联合特征向量F=[Si,Ce]加入正则项通过对用户和环境特征的加权组合实现更高的识别精度,表示如下:Adding a regularization term to the user-context joint feature vector F = [S i ,C e ] achieves higher recognition accuracy by weighted combination of user and environment features, as shown below: Fopt=F+λ·(F⊙Wreg)F opt = F + λ·(F⊙W reg ) 其中,Fopt表示优化后的联合特征向量;λ表示正则化参数,用于平衡原始特征和正则项的影响;⊙表示元素乘积操作,用于逐元素加权;Wreg表示权重矩阵,表示特征之间的相互关系。Among them, F opt represents the optimized joint feature vector; λ represents the regularization parameter, which is used to balance the influence of the original features and the regularization term; ⊙ represents the element-by-element product operation, which is used for element-by-element weighting; W reg represents the weight matrix, which represents the relationship between features. 5.根据权利要求1所述的一种智能音箱权限设计管理方法,其特征在于,所述配置初始化包括:5. A smart speaker permission design management method according to claim 1, characterized in that the configuration initialization comprises: 根据调整后的用户-情境联合特征向量设计权限评分函数,用于计算用户在当前情境下的权限评分表示如下:Design a permission scoring function based on the adjusted user-context joint feature vector to calculate the user's permission score in the current context It is expressed as follows: 其中,表示权限评分,用于反映用户在当前情境下的权限级别;表示权重向量,表示各特征的重要性;β表示权重调整系数,控制联合特征的整体权重;表示联合特征的二范数,作为特征重要性的度量,增加了特征间平衡的约束;表示偏置项;in, Indicates the permission score, which is used to reflect the user's permission level in the current context; represents the weight vector, indicating the importance of each feature; β represents the weight adjustment coefficient, controlling the overall weight of the joint features; The bi-norm of the joint features is used as a measure of feature importance, which adds constraints on the balance between features. represents the bias term; 根据计算的权限评分和预设的权限等级阈值,进行权限动态调整,其中,权限调整函数表示如下:Based on the calculated authority score and preset permission level threshold, and dynamically adjust permissions. It is expressed as follows: 其中,表示动态调整后的权限级别;表示权限阈值,根据用户需求和系统配置进行设置。in, Indicates the dynamically adjusted permission level; Indicates the permission threshold, which is set according to user needs and system configuration. 6.根据权利要求1所述的一种智能音箱权限设计管理方法,其特征在于,在S3中,对模糊匹配和动态调节模型设计误差修正反馈机制动态调整模糊匹配模型的权重和参数,所述误差修正机制表示如下:6. A smart speaker authority design and management method according to claim 1, characterized in that, in S3, an error correction feedback mechanism is designed for the fuzzy matching and dynamic adjustment model to dynamically adjust the weights and parameters of the fuzzy matching model, and the error correction mechanism is expressed as follows: Δw=∈·(γtarget-γ)·Cnew Δw=∈·(γ target -γ)·C new 其中,Δw表示权重调整向量,用于修正语音特征的权重;∈表示学习率,用于控制权重调整的步长;γtarget表示目标可信度得分,通常设置为系统期望的可信度水平;γ表示当前语音输入的可信度得分;Cnew表示当前语音输入的特征向量。Among them, Δw represents the weight adjustment vector, which is used to correct the weight of speech features; ∈ represents the learning rate, which is used to control the step size of weight adjustment; γ target represents the target credibility score, which is usually set to the credibility level expected by the system; γ represents the credibility score of the current speech input; C new represents the feature vector of the current speech input. 7.根据权利要求1所述的一种智能音箱权限设计管理方法,其特征在于,根据加权行为特征向量Bw构建基于长短时记忆网络的监控与异常检测模型,预测下一个时刻的行为特征,识别行为的异常模式,其中,所述行为监控与异常检测模型的更新公式表示如下:7. According to claim 1, a smart speaker permission design management method is characterized in that a monitoring and anomaly detection model based on a long short-term memory network is constructed according to the weighted behavior feature vector Bw to predict the behavior characteristics at the next moment and identify the abnormal behavior pattern, wherein the update formula of the behavior monitoring and anomaly detection model is expressed as follows: ht=fLSTM(Bw,t,ht-1)h t = f LSTM (B w,t ,h t-1 ) 其中,ht表示当前时刻t的隐藏状态,捕捉用户行为的时间依赖性;Bw,t表示当前时刻t的加权行为特征向量;ht-1表示前一时刻t-1的隐藏状态;fLSTM表示长短时记忆网络的状态更新函数;Among them, h t represents the hidden state at the current time t, capturing the time dependency of user behavior; B w,t represents the weighted behavior feature vector at the current time t; h t-1 represents the hidden state at the previous time t-1; f LSTM represents the state update function of the long short-term memory network; 通过计算实际行为特征和预测行为特征之间的残差,判断是否存在异常行为;其中,所述异常得分ξ,表示如下:By calculating the residual between the actual behavior characteristics and the predicted behavior characteristics, it is determined whether there is abnormal behavior; wherein the abnormal score ξ is expressed as follows: 其中,ξ表示异常得分,表示当前行为与预测行为的差异程度;Bw,t表示当前时刻t的实际加权行为特征向量;表示长短时记忆网络预测的下一时刻的加权行为特征向量;σ2表示预测残差的方差,用于标准化异常得分;Among them, ξ represents the anomaly score, which indicates the difference between the current behavior and the predicted behavior; Bw,t represents the actual weighted behavior feature vector at the current time t; represents the weighted behavior feature vector of the next moment predicted by the long short-term memory network; σ 2 represents the variance of the prediction residual, which is used to standardize the anomaly score; 根据异常得分ξ与预设的异常阈值θ,判断是否触发权限调整或警告机制;其中,调整策略如下:According to the abnormal score ξ and the preset abnormal threshold θ, it is determined whether to trigger the permission adjustment or warning mechanism; the adjustment strategy is as follows: 如果ξ≤θ,则不进行调整,维持当前权限级别 If ξ≤θ, no adjustment is made and the current permission level is maintained 如果ξ>θ,则识别为异常行为,根据异常严重程度进行权限收紧或提示用户进行身份验证。If ξ>θ, it is identified as abnormal behavior, and permissions are tightened or the user is prompted to authenticate based on the severity of the abnormality. 8.根据权利要求1所述的一种智能音箱权限设计管理方法,其特征在于,所述用户的情感状态的采集包括音频和视频特征采集,然后使用基于自注意力机制的融合模型来捕捉多模态特征之间的相互影响,具体公式表示如下:8. According to claim 1, a smart speaker permission design management method is characterized in that the collection of the user's emotional state includes audio and video feature collection, and then a fusion model based on a self-attention mechanism is used to capture the mutual influence between multimodal features. The specific formula is as follows: 其中,E表示融合的情感特征向量,用于描述用户当前的综合情感状态;A表示音频特征向量,维度为m;V表示视频特征向量,维度为n;WA和WV表示投影矩阵,用于将音频和视频特征映射到同一空间,维度为m×dk和n×dk;dk表示注意力机制中键向量的维度,用于缩放点积操作;softmax表示用于计算注意力权重的归一化函数;Where E represents the fused emotional feature vector, which is used to describe the user's current comprehensive emotional state; A represents the audio feature vector with a dimension of m; V represents the video feature vector with a dimension of n; W A and W V represent projection matrices, which are used to map audio and video features to the same space with dimensions of m×d k and n×d k ; d k represents the dimension of the key vector in the attention mechanism, which is used to scale the dot product operation; softmax represents the normalization function used to calculate the attention weight; 利用融合后的情感特征向量E,通过情感分类器模型C(E)进行情感状态的分类,生成情感标签c,表示如下:Using the fused emotional feature vector E, the emotional state is classified through the emotional classifier model C(E) to generate the emotional label c, which is expressed as follows: 其中,c表示当前用户的情感状态标签,表示为离散类别;wi表示分类器模型的权重向量,第i类对应的权重,维度为dk;bi表示分类器模型的偏置项,第i类对应的偏置;Where c represents the emotional state label of the current user, expressed as a discrete category; wi represents the weight vector of the classifier model, the weight corresponding to the i-th category, and the dimension is dk ; bi represents the bias term of the classifier model, the bias corresponding to the i-th category; 根据情感状态标签c和异常行为响应策略,提出情感状态与权限响应的映射函数用来动态调整权限级别表示如下:According to the emotional state label c and the abnormal behavior response strategy, a mapping function between emotional state and authority response is proposed Used to dynamically adjust permission levels It is expressed as follows: 其中,表示根据情感状态动态调整后的权限级别,考虑了用户情感的权限设置;ρ(c)表示权限调整增量函数,取决于用户情感状态c,表示为可正可负的整数。in, It represents the permission level dynamically adjusted according to the emotional state, and takes the user's emotions into consideration when setting the permission. ρ(c) represents the permission adjustment increment function, which depends on the user's emotional state c and is expressed as a positive or negative integer. 9.根据权利要求8所述的一种智能音箱权限设计管理方法,其特征在于,所述基于强化学习的自适应权限优化模型为用户当前的权限级别和情感状态,以及用户的历史行为模式和系统反馈,即状态向量其中,H表示用户的历史行为模式特征向量;F表示系统反馈特征向量;将系统可执行的动作定义为权限的调整操作集合其中每个动作ai表示对权限级别的调整;义优化目标函数J(θ),通过对策略πθ(S)的参数θ进行优化,最大化期望的累积奖励;9. A smart speaker permission design management method according to claim 8, characterized in that the adaptive permission optimization model based on reinforcement learning is the user's current permission level and emotional state, as well as the user's historical behavior pattern and system feedback, that is, the state vector Among them, H represents the user's historical behavior pattern feature vector; F represents the system feedback feature vector; the system's executable actions are defined as the permission adjustment operation set Each action a i represents an adjustment to the permission level; the objective function J(θ) is optimized to maximize the expected cumulative reward by optimizing the parameters θ of the strategy π θ (S); 优化目标函数J(θ)的公式如下:The formula for optimizing the objective function J(θ) is as follows: 其中,J(θ)表示策略参数θ的优化目标函数,表示期望的累积奖励;S表示当前状态向量,由权限级别、情感状态、历史行为模式和系统反馈组成;表示状态分布集合,表示可能的状态空间;γ表示折扣因子,取值范围为0≤γ≤1,用于度量未来奖励的重要性;r(St,at)表示在状态St执行动作at时获得的即时奖励,反映了权限调整的有效性和合理性;at表示在时刻t执行的动作,由策略πθ(S)确定;Where J(θ) represents the optimization objective function of the policy parameter θ, which represents the expected cumulative reward; S represents the current state vector, which consists of the authority level, emotional state, historical behavior pattern, and system feedback; represents the state distribution set, which represents the possible state space; γ represents the discount factor, which ranges from 0≤γ≤1 and is used to measure the importance of future rewards; r(S t ,a t ) represents the immediate reward obtained when the action a t is executed in the state S t , which reflects the effectiveness and rationality of the authority adjustment; a t represents the action executed at time t, which is determined by the strategy π θ (S); 在策略优化过程中,引入策略正则化项Ω(θ)避免策略过度拟合用户的短期行为模式,更新策略参数θ最大化正则化的优化目标,表示如下:In the process of policy optimization, the policy regularization term Ω(θ) is introduced to prevent the policy from overfitting the user's short-term behavior pattern, and the policy parameter θ is updated to maximize the regularized optimization objective, which is expressed as follows: 其中,θt+1表示更新后的策略参数;θt表示当前时刻t的策略参数;α表示学习率,控制参数更新的步长;表示策略梯度,表示优化目标函数对策略参数的梯度;λ表示正则化强度参数,控制正则化项对策略更新的影响程度;Ω(θt)表示策略正则化项,为稀疏性正则化,鼓励策略参数的简洁性和泛化能力,表示如下:Among them, θ t+1 represents the updated policy parameters; θ t represents the policy parameters at the current time t; α represents the learning rate, which controls the step size of parameter update; represents the policy gradient, which represents the gradient of the optimization objective function to the policy parameters; λ represents the regularization strength parameter, which controls the influence of the regularization term on the policy update; Ω(θ t ) represents the policy regularization term, which is a sparsity regularization that encourages the simplicity and generalization of policy parameters, and is expressed as follows: 其中,N表示策略参数的总数;θi表示策略参数中的第i个参数。Where N represents the total number of strategy parameters; θ i represents the i-th parameter in the strategy parameters. 10.一种智能音箱权限设计管理系统,其特征在于,所述系统包括:10. A smart speaker authority design management system, characterized in that the system includes: 用户数据收集模块,用于收集用户音频数据并对音频数据进行预处理,根据预处理后的数据构建高斯混合模型输出音频签名;A user data collection module is used to collect user audio data and preprocess the audio data, and construct a Gaussian mixture model based on the preprocessed data to output an audio signature; 配置初始化模块,用于将音频签名作为输入代表了用户的身份特征,采集实时环境数据并进行预处理,根据用户的身份特征结合预处理后的实时环境数据构建用户-情境联合特征向量并在用户-情境联合特征向量加入正则项进行识别精度调整,然后进行情境感知与动态权限配置初始化,得到优化后的权限级别和权限评分;Configuration initialization module, used to take audio signature as input to represent the identity characteristics of the user, collect real-time environment data and pre-process it, build a user-context joint feature vector based on the user's identity characteristics combined with the pre-processed real-time environment data, add regularization terms to the user-context joint feature vector to adjust the recognition accuracy, and then perform context awareness and dynamic permission configuration initialization to obtain the optimized permission level and permission score; 权限初始化模块,用于根据优化后的权限级别和权限评分,若出现多种语言、多种口音的复杂语音环境,则设计模糊匹配和动态调节模型对智能音箱权限的实时动态管理;其中,所述模糊匹配和动态调节模型使用特征加权聚合方法计算语音特征向量Cnew的可信度得分γ,表示如下:The permission initialization module is used to design a fuzzy matching and dynamic adjustment model for real-time dynamic management of smart speaker permissions based on the optimized permission level and permission score if a complex voice environment with multiple languages and multiple accents appears; wherein the fuzzy matching and dynamic adjustment model uses a feature weighted aggregation method to calculate the credibility score γ of the voice feature vector C new , which is expressed as follows: 其中,γ表示语音输入的可信度得分,表示当前语音特征匹配度的可信性;M表示语音特征的总数;wj表示第j个特征的权重,表示该特征在计算语音可信度时的重要性;αj表示第j个特征的模糊调整参数,控制特征匹配的敏感度;Cnew,j表示第j个语音特征向量的分量;μj表示第j个特征的期望值,表示该特征在正常情况下的典型值;Wherein, γ represents the credibility score of the speech input, indicating the credibility of the current speech feature matching; M represents the total number of speech features; wj represents the weight of the jth feature, indicating the importance of the feature in calculating the speech credibility; αj represents the fuzzy adjustment parameter of the jth feature, controlling the sensitivity of feature matching; Cnew ,j represents the component of the jth speech feature vector; μj represents the expected value of the jth feature, indicating the typical value of the feature under normal circumstances; 利用语音输入的可信度得分γ和优化后的权限评分根据权限调节函数动态调整用户的权限级别表示如下:Using the credibility score γ of voice input and the optimized authority score Adjusting functions based on permissions Dynamically adjust user permission levels It is expressed as follows: 其中,表示动态调整后的权限级别,表示根据语音可信度和原始权限评分进行调节后的权限设置;表示第二步骤输出的初始权限级别;η表示权限调整增益系数,控制权限调整幅度;γ表示语音输入的可信度得分;δ表示可信度得分的中间阈值;表示优化的权限评分;τ表示权限评分的基准阈值;根据计算出的新权限级别智能音箱系统即时调整用户的权限配置,动态改变功能的可访问性;in, Indicates the dynamically adjusted permission level, which indicates the permission setting adjusted according to the voice credibility and the original permission score; represents the initial authority level output in the second step; η represents the authority adjustment gain coefficient, which controls the authority adjustment range; γ represents the credibility score of the speech input; δ represents the intermediate threshold of the credibility score; represents the optimized permission score; τ represents the baseline threshold of the permission score; based on the calculated new permission level The smart speaker system instantly adjusts the user's permission configuration and dynamically changes the accessibility of functions; 权限管理模块,用于根据结合复杂语音环境优化后的新权限级别再结合实时用户行为数据,构建行为监控与异常检测模型通过特征工程和时序建模来实时分析用户权限,对用户行为进行异常检测和行为监控,并根据异常检测和行为监控的结果动态调整权限级别根据权限级别结合用户的情感状态通过情感分类器模型进行情感状态的分类,生成情感标签,根据情感状态标签和异常行为响应策略,提出情感状态与权限响应的映射函数动态调整权限级别,根据情感状态与权限响应映射函数计算得到的权限级别智能音箱即时应用新的权限配置;The permission management module is used to optimize the new permission level based on the complex voice environment. Combined with real-time user behavior data, a behavior monitoring and anomaly detection model is built to analyze user permissions in real time through feature engineering and time series modeling, perform anomaly detection and behavior monitoring on user behavior, and dynamically adjust permission levels based on the results of anomaly detection and behavior monitoring. Based on permission level Combined with the user's emotional state, the emotional state is classified through the emotion classifier model to generate emotional labels. According to the emotional state labels and abnormal behavior response strategies, a mapping function between emotional state and permission response is proposed to dynamically adjust the permission level. The permission level calculated by the mapping function between emotional state and permission response is Smart speakers instantly apply new permission configurations; 系统优化模块,用于根据权限级别构建“基于强化学习的自适应权限优化模型,结合用户长期的行为和使用模式,通过强化学习对智能音箱的权限策略进行自适应学习进行长期策略优化。System optimization module for Build an adaptive permission optimization model based on reinforcement learning, combine users' long-term behavior and usage patterns, and use reinforcement learning to adaptively learn the permission strategy of smart speakers for long-term strategy optimization.
CN202411217025.7A 2024-09-02 2024-09-02 A smart speaker permission design management method and system Active CN119203100B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411217025.7A CN119203100B (en) 2024-09-02 2024-09-02 A smart speaker permission design management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411217025.7A CN119203100B (en) 2024-09-02 2024-09-02 A smart speaker permission design management method and system

Publications (2)

Publication Number Publication Date
CN119203100A true CN119203100A (en) 2024-12-27
CN119203100B CN119203100B (en) 2025-06-20

Family

ID=94057533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411217025.7A Active CN119203100B (en) 2024-09-02 2024-09-02 A smart speaker permission design management method and system

Country Status (1)

Country Link
CN (1) CN119203100B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119741171A (en) * 2025-03-03 2025-04-01 龙岩学院 A smart education management system based on multi-user collaboration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763892A (en) * 2018-04-18 2018-11-06 Oppo广东移动通信有限公司 Authority management method, device, mobile terminal and storage medium
WO2022268136A1 (en) * 2021-06-22 2022-12-29 海信视像科技股份有限公司 Terminal device and server for voice control
CA3177530A1 (en) * 2021-07-14 2023-01-14 Strong Force TX Portfolio 2018, LLC Systems and methods with integrated gaming engines and smart contracts
CN118042355A (en) * 2024-04-11 2024-05-14 江西天创智能科技有限公司 Automatic control system and method for intelligent sound control sound equipment of stage
CN118411983A (en) * 2024-04-16 2024-07-30 北京四方智汇信息科技有限公司 Data processing method based on speech recognition model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763892A (en) * 2018-04-18 2018-11-06 Oppo广东移动通信有限公司 Authority management method, device, mobile terminal and storage medium
WO2022268136A1 (en) * 2021-06-22 2022-12-29 海信视像科技股份有限公司 Terminal device and server for voice control
CA3177530A1 (en) * 2021-07-14 2023-01-14 Strong Force TX Portfolio 2018, LLC Systems and methods with integrated gaming engines and smart contracts
CN118042355A (en) * 2024-04-11 2024-05-14 江西天创智能科技有限公司 Automatic control system and method for intelligent sound control sound equipment of stage
CN118411983A (en) * 2024-04-16 2024-07-30 北京四方智汇信息科技有限公司 Data processing method based on speech recognition model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALICE COUCKE等: "Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces", ARXIV, 25 May 2018 (2018-05-25), pages 1 - 29 *
李宗霖;: "基于动态网络构思的三网融合核心技术", 前沿科学, no. 01, 28 March 2010 (2010-03-28) *
陈如: "智能化宾馆物业管理系统的实现", 安徽机电学院学报, 31 January 2000 (2000-01-31), pages 75 - 78 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119741171A (en) * 2025-03-03 2025-04-01 龙岩学院 A smart education management system based on multi-user collaboration

Also Published As

Publication number Publication date
CN119203100B (en) 2025-06-20

Similar Documents

Publication Publication Date Title
CN118673165B (en) Multi-mode false news detection method and system based on text emotion characteristics and multi-level fusion
KR20190094319A (en) An artificial intelligence apparatus for performing voice control using voice extraction filter and method for the same
CN119203100B (en) A smart speaker permission design management method and system
CN119357888A (en) An intelligent behavior analysis system
CN120163653A (en) Dynamic risk control method, device, equipment and storage medium
CN119153132B (en) An artificial intelligence remote psychological consultation platform and method based on emotional cognition
CN120145312A (en) An intelligent security management system based on AI big model
CN120257050A (en) Dynamic adjustment method of robot behavior mode based on multimodal perception
CN119939229B (en) Network content propagation method and system based on fusion cognitive understanding and intelligent management
CN120781259A (en) Risk detection method, device, equipment and medium for big data abnormal behavior
CN120725209A (en) Marketing strategy optimization method based on consumer group behavior analysis
Gade et al. Speaker recognition using improved butterfly optimization algorithm with hybrid long short term memory network
CN120388565B (en) Voice interaction method and system based on 3D (three-dimensional) virtual
CN119441995B (en) A multimodal student classroom mental health early warning method and system
US20230342108A1 (en) Enhanced computing device representation of audio
Namburi Speaker recognition based on mutated monarch butterfly optimization configured artificial neural network
CN119132308A (en) A fraud prevention communication method based on voice change recognition
Alexeevskaya et al. Recognizing human emotions using a convolutional neural network
Bhardwaj et al. Identification of speech signal in moving objects using artificial neural network system
CN120354176B (en) Training methods for behavioral decision-making models and adaptive interaction methods for digital humans
CN117271743B (en) Multi-mode dialogue emotion recognition method and system
Samanta et al. RETRACTED ARTICLE: An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network
Lyu et al. Global and local feature fusion via long and short-term memory mechanism for dance emotion recognition in robot
US20250356874A1 (en) Artificial intelligence device and operating method thereof
Segarceanu et al. Evaluation of deep learning techniques for acoustic environmental events detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant