[go: up one dir, main page]

CN111124860A - Method for identifying user by using keyboard and mouse data in uncontrollable environment - Google Patents

Method for identifying user by using keyboard and mouse data in uncontrollable environment Download PDF

Info

Publication number
CN111124860A
CN111124860A CN201911291751.2A CN201911291751A CN111124860A CN 111124860 A CN111124860 A CN 111124860A CN 201911291751 A CN201911291751 A CN 201911291751A CN 111124860 A CN111124860 A CN 111124860A
Authority
CN
China
Prior art keywords
keyboard
mouse
action
user
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911291751.2A
Other languages
Chinese (zh)
Other versions
CN111124860B (en
Inventor
廖永建
王栋
梁艺宽
吴宇
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201911291751.2A priority Critical patent/CN111124860B/en
Publication of CN111124860A publication Critical patent/CN111124860A/en
Application granted granted Critical
Publication of CN111124860B publication Critical patent/CN111124860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3041Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is an input/output interface
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明公开了一种在不可控环境下使用键盘和鼠标数据识别用户的方法,包括如下步骤:步骤1,数据采集:在计算机中部署键盘动作采集程序和鼠标动作采集程序,采集计算机日常操作中的键盘元数据和鼠标元数据;步骤2,特征提取:从采集的键盘元数据和鼠标元数据中,提取键盘动作特征和鼠标动作特征;步骤3,模型培训:利用提取的键盘动作特征和鼠标动作特征在每个用户之间训练一个增强拓扑的进化神经网络,得到用户识别模型;步骤4,用户识别:利用用户识别模型对待识别用户进行识别。本发明使用键盘动作特征和鼠标动作特征相结合,比只使用单独一种特征的方法更加有效,并在使用NEAT算法进行训练后,比传统的SVM和神经网络算法有更高的识别率。

Figure 201911291751

The invention discloses a method for identifying a user by using keyboard and mouse data in an uncontrollable environment, comprising the following steps: Step 1, data collection: deploying a keyboard action collection program and a mouse action collection program in a computer, and collects data during the daily operation of the computer. step 2, feature extraction: from the collected keyboard metadata and mouse metadata, extract keyboard action features and mouse action features; step 3, model training: use the extracted keyboard action features and mouse The action feature trains a topology-enhanced evolutionary neural network between each user to obtain a user identification model; step 4, user identification: use the user identification model to identify the user to be identified. The invention uses the combination of the keyboard action feature and the mouse action feature, which is more effective than the method of only using a single feature, and has a higher recognition rate than the traditional SVM and neural network algorithms after training with the NEAT algorithm.

Figure 201911291751

Description

Method for identifying user by using keyboard and mouse data in uncontrollable environment
Technical Field
The invention relates to the technical field of network space behavior identification, in particular to a method for identifying a user by using keyboard and mouse data in an uncontrollable environment.
Background
In the network space, the user identification has wide applications, such as personalized recommendation, system security, and the like. Biometric systems are a relatively common application that relies on the measurement of physiological or behavioral characteristics to determine or verify the identity of an individual. In cyberspace, behavioral biometric systems rely primarily on input devices such as keyboards and mice, which are already commonly available in most computers and therefore less costly without additional device requirements.
Analysis of keyboard and mouse typing rhythms, known as keystroke dynamics and mouse dynamics, has received increasing attention in recent years. Keystroke dynamics refers to the process of measuring and evaluating human typing rhythms on digital devices, which are quite unique to everyone due to unique neurophysiological factors. The earliest use of Keystroke times for identity verification was proposed in 1980 by Gaines in the paper Authentication by Keystick Timing, Some preliminary results. In 2003, Gamboa and Fred collected mouse movement and mouse click data of volunteers playing a memory game on a web page for 10-15 minutes and used these behavioral information to verify the identity of individuals.
In the last few years, most studies have collected a set of features of several volunteers in a controlled environment to improve the accuracy of user recognition in terms of keystroke dynamics and mouse dynamics. Many statistical and machine learning identification algorithms are widely used for user identification based on keystroke and mouse dynamics. For example, in 2012, Traore et al introduced a risk-based authentication system for an experimental social networking site that achieved an EER of 8.21% using the BN model. In 2015, Wu et al proposed an active user behavior recognition data loss prevention model that combines user keystroke and mouse behavior. However, due to the difference between the data acquisition environment and the data set, the results of these user identification methods are significantly different, which makes it difficult to reproduce the experimental results in the practical application environment.
For actual human-computer interaction, mouse operation and keyboard operation are integrated, that is, a user completes a series of clicking and inputting actions through subsequent keyboard and mouse operations. There have been many studies on keystroke dynamics and mouse dynamics, respectively. However, some studies have considered real-environment human-computer interaction (HCI) behavior, which is a fusion of keyboard and mouse behavior data. Some existing researches on HCI behaviors mainly utilize traditional keyboard and mouse features and combine with shallow-layer machine learning algorithms such as Support Vector Machine (SVM) and Decision Tree (DT), and few researches consider differences of user keyboard and mouse operation behaviors and effective methods of keyboard and mouse integration. Another problem is that most of these studies are focused on controlled environments, with only a fairly low user recognition accuracy if uncontrolled environment datasets are used.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to solve the existing problems, the method for identifying the user by using the keyboard and the mouse data in the uncontrollable environment is provided, and the method can extract the user characteristics by detecting the mouse and keyboard behaviors in the daily life of the user in the uncontrollable environment so as to achieve the aim of realizing user identification.
The technical scheme adopted by the invention is as follows:
a method for identifying a user using keyboard and mouse data in an uncontrolled environment, comprising the steps of:
step 1, data acquisition: deploying a keyboard action acquisition program and a mouse action acquisition program in a computer, and acquiring keyboard metadata and mouse metadata in daily operation of the computer;
step 2, feature extraction: extracting keyboard action characteristics and mouse action characteristics from the collected keyboard metadata and mouse metadata;
step 3, training a model: training an evolutionary neural network with enhanced topology among each user by using the extracted keyboard action characteristics and mouse action characteristics to obtain a user identification model;
step 4, user identification: and identifying the user to be identified by utilizing the user identification model.
Further, the method of step 1 is: a keyboard action acquisition program and a mouse action acquisition program deployed in a computer acquire keyboard metadata and mouse metadata in daily operation of the computer through a hook chain table; the keyboard metadata comprises a key name, a key press timestamp and a key release timestamp; the mouse metadata includes an operation type, a timestamp, and x and y coordinates of a mouse pointer position.
Further, the method for extracting the keyboard action features in the step 2 comprises the following steps:
step 211, dividing the keyboard metadata into keyboard data sets of a time window t 1;
step 212, calculating the duration of each key and the delay time of the previous key and the next key by using the keyboard data set; the duration of each key is the average value of each pressing time of each key in a time window t1, and the pressing time is the difference between the key release time stamp of each key and the key pressing time stamp; the delay time is the difference value of the key pressing time stamp of the next key and the key releasing time stamp of the previous key;
step 213, if the duration of each key in the keyboard data set is a duration characteristic, and the delay time of the previous key and the next key is a delay time characteristic, the keyboard action characteristics include k duration characteristics and k2And a delay time characteristic, wherein k is the number of keys of a keyboard used by the computer.
Further, before the keyboard action features are extracted, a delay threshold T1 is set, and when the time for the user to stop using the keyboard exceeds the delay threshold T1, a new keyboard data sequence is divided when the delay occurs, so as to remove the data blank in the keyboard action features.
Further, the method for extracting the mouse action features in the step 2 comprises the following steps:
step 221, dividing the mouse metadata into mouse data sets of a time window t 2;
step 222, calculating the direction, curvature angle and curvature distance of the mouse action by using the mouse data set:
(1) the direction is as follows: for any two continuous points B and C, the point B to the point C runs along a straight line, and the included angle between the straight line and the horizontal line is the action direction of the mouse;
(2) angle of curvature: for any three continuous points A, B and C, the included angle between the straight line from the point A to the point B and the straight line from the point B to the point C is the curvature angle;
(3) curvature distance: for any three points A, B and C in succession, the curvature distance of point B is the ratio of the perpendicular distance from point B to the line between point A and point C to the straight line distance from point A to point C;
step 223, respectively calculating the cumulative distribution function of the mouse motion direction, curvature angle and curvature distance in the mouse data set as the mouse motion characteristics.
Further, before the mouse action characteristic is extracted, a delay threshold T2 is set, and when the time for the user to stop using the mouse exceeds a delay threshold T2, a new mouse data sequence is divided when the delay occurs, so as to remove data blanks in the mouse action characteristic.
Further, the method of step 3 is:
step 31, performing z-score normalization processing on the extracted keyboard action features and mouse action features to enable the average value of the input keyboard action features and mouse action features to be 0 and the variance to be 1; after normalization processing, dividing all keyboard action characteristics and mouse action characteristics into a training set and a verification set;
step 32, using a one-to-one method to divide the multi-classification problem of the N users into two classification problems between N x (N-1)/2 users, wherein each two classification problem corresponds to an evolutionary neural network;
step 33, setting the number of input nodes of each evolutionary neural network as the number of keyboard action features and mouse action features in a training set, setting the number of hidden nodes as 0 and the number of output nodes as 2, wherein the hidden nodes respectively correspond to the similarity degrees of two users in classification, and the highest similarity degree is 1 and the lowest similarity degree is 0; setting the Fitness function Fitness of each evolutionary neural network as:
Fitness=fitness-(output[0]-xo[0])**2-(output[1]-xo[1])**2
wherein, the fitness is a numerical value expected to be reached, is the number of samples of the training set, i.e. 2, the output is the output corresponding to each sample of the training set input by the current evolutionary neural network, and the xo is the expected output; the larger the Fitness is, the higher the recognition success rate of the evolutionary neural network is, and when the output of each sample in the training set is consistent with the expected output, the Fitness reaches the maximum value;
and step 34, training the N x (N-1)/2 evolutionary neural networks to obtain N x (N-1)/2 evolutionary neural network models with the highest user classification success rate one by one, and taking the models as user identification models.
Preferably, after the normalization process in step 31, 70% of all the keyboard motion features and mouse motion features are used as a training set, and 30% are used as a verification set.
Further, the method of step 4 is:
step 41, collecting keyboard metadata and mouse metadata of a user to be identified by using the method in step 1;
step 42, extracting the keyboard action characteristics and the mouse action characteristics of the user to be identified by using the method in the step 2;
and 43, inputting the keyboard action characteristics and the mouse action characteristics of the user to be identified into the N x (N-1)/2 evolutionary neural network models obtained in the step 3, taking the similarity output by each evolutionary neural network model as the voting scores of the users, and correspondingly adding the voting scores according to the users to obtain the user with the highest score, namely the user to be identified.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the invention has high recognition rate and more comprehensive data sources: the invention combines the keyboard action characteristic and the mouse action characteristic, has more comprehensive data sources than a method only using a single characteristic, and has higher recognition rate than the traditional SVM and neural network algorithm after training by using a NEAT algorithm.
2. The invention is not influenced by the user operation environment: after using the direction, curvature angle, and curvature distance features of the mouse action, it is relatively platform independent since the direction and curvature angle are not based on screen size or other elements of the user's environment. This angle-based metric is relatively stable on different platforms and is not affected by the user's operating environment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic diagram of a method of identifying a user using keyboard and mouse data in an uncontrolled environment according to the present invention.
FIG. 2 is a diagram illustrating a mouse action feature extraction method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
As shown in FIG. 1, a method for identifying a user using keyboard and mouse data in an uncontrolled environment according to the present invention comprises the steps of:
step 1, data acquisition: deploying a keyboard action acquisition program and a mouse action acquisition program in a computer, and acquiring keyboard metadata and mouse metadata in daily operation of the computer;
step 2, feature extraction: extracting keyboard action characteristics and mouse action characteristics from the collected keyboard metadata and mouse metadata;
step 3, training a model: training an evolutionary neural network with enhanced topology among each user by using the extracted keyboard action characteristics and mouse action characteristics to obtain a user identification model;
step 4, user identification: identifying a user to be identified using a user identification model
The features and properties of the present invention are described in further detail below with reference to examples.
Example 1
(1) Data acquisition:
a keyboard action acquisition program and a mouse action acquisition program deployed in a computer acquire keyboard metadata and mouse metadata in daily operation of the computer through a hook linked list (hook); in order to meet the requirement of data acquisition in an actual environment, a keyboard action acquisition program and a mouse action acquisition program can be developed in a common Windows operating system of a computer, and it should be noted that, for the requirement, a corresponding keyboard action acquisition program and a corresponding mouse action acquisition program can also be developed in other computer operating systems. The two programs automatically run in the background of the computer operating system, and collect the keyboard metadata and the mouse metadata of the daily keyboard and mouse operation of the user as much as possible.
In this embodiment, the keyboard metadata includes a key name, a key press timestamp, and a key release timestamp; the mouse metadata includes an operation type, a timestamp, and x and y coordinates of a mouse pointer position. Generally, the KEYBOARD metadata is obtained through a global KEYBOARD hook (WH _ KEYBOARD), and the MOUSE metadata is obtained through a global MOUSE hook (WH _ MOUSE). All keyboard metadata and mouse metadata are stored in multiple files for later processing.
(2) Feature extraction:
(2.1) keyboard action characteristics:
before extracting the keyboard action characteristic, a delay threshold T1 is set, generally T1 is 1 minute, when the time for the user to stop using the keyboard exceeds the delay threshold T1, a new keystroke data sequence is divided when the delay occurs, and the data blank in the keyboard action characteristic is removed. Then, extracting the keyboard action characteristics:
step 211, dividing the keyboard metadata into keyboard data sets of a time window t 1; taking t1 for 5 minutes generally;
step 212, calculating the duration of each key and the delay time of the previous key and the next key by using the keyboard data set; the duration of each key is the average value of each pressing time of each key in a time window t1, and the pressing time is the difference between the key release time stamp of each key and the key pressing time stamp; the delay time is the difference value of the key pressing time stamp of the next key and the key releasing time stamp of the previous key;
step 213, if the duration of each key in the keyboard data set is a duration characteristic, and the delay time of the previous key and the next key is a delay time characteristic, the keyboard action characteristics include k duration characteristics and k2And a delay time characteristic, wherein k is the number of keys of a keyboard used by the computer. For example, a typical keyboard has 110 keys, the keyboard action features include 110 duration features and 12100 delay time features. However, since the key arrangement is required to occur infrequently during the use of the keyboard by the user, the influence on the entire data set is small, and thus the delay time characteristics having small influence can be deleted to reduce the amount of calculation.
(2.2) mouse action characteristics
Before the mouse action characteristic is extracted, a delay threshold T2 is set, generally T2 is 30 seconds, and when the time for a user to stop using the mouse exceeds the delay threshold T2, a new mouse data sequence is divided when the delay occurs, so as to remove data blanks in the mouse action characteristic. Then, extracting mouse action characteristics:
step 221, dividing the mouse metadata into mouse data sets of a time window t 2; taking t2 for 5 minutes generally;
step 222, as shown in fig. 2, calculating the direction, curvature angle and curvature distance of the mouse action by using the mouse data set:
(1) the direction is as follows: for any two continuous points B and C, the point B to the point C travel along a straight line, and the included angle between the straight line and the horizontal line is the action direction of the mouse, such as the included angle X in FIG. 2;
(2) angle of curvature: for any three consecutive points A, B and C, the angle between the straight line from point A to point B and the straight line from point B to point C is the curvature angle, such as the angle Y in FIG. 2;
(3) curvature distance: for any three points A, B and C in succession, the curvature distance of point B is the ratio of the perpendicular distance from point B to the line between point A and point C to the straight line distance from point A to point C; the curvature distance is unitless because it is the ratio of the two distances.
Step 223, calculating the cumulative distribution function of the mouse action direction (the numerical range is 0-360 °), the curvature angle (the numerical range is 0-180 °) and the curvature distance (the numerical range is 0-200) in the mouse data set as the mouse action characteristics. If the numerical values of the direction, the curvature angle and the curvature distance exceed the numerical range, the upper limit or the lower limit is correspondingly taken.
(3) Model training:
step 31, performing z-score normalization processing on the extracted keyboard action features and mouse action features to enable the average value of the input keyboard action features and mouse action features to be 0 and the variance to be 1; after normalization processing, dividing all keyboard action characteristics and mouse action characteristics into a training set and a verification set; generally, 70% of all the keyboard action features and mouse action features are taken as a training set, and 30% are taken as a verification set. The training set is used as input of training the evolutionary neural network, and the verification set is used for verifying the recognition success rate of the evolutionary neural network model obtained through training.
Step 32, using a one-to-one method (one VS one) to divide the multi-class problem of N users into two classes problems between N × N-1/2 users, where each two class problem corresponds to an evolutionary neural Network (NEAT), such as 1VS 1, … …, 1VS N, 2VS 1, … …, 2VS N, … …, and N-1VS N shown in fig. 2;
among them, the one-to-one method (one vs one) means: if n classes are provided, a binomial classifier is established for every two classes, and k is n (n-1)/2 classifiers are obtained. When new data is classified, the k classifiers are used for classification in sequence, each classification is equivalent to one voting, and the classification result is equivalent to which class is voted. After all k classifiers are used for classification, the class with the most votes is selected as the final classification result, which is equivalent to k times of voting. Therefore, in this embodiment, the one-to-one method (one vs one) is used to divide the multi-classification problem of N users into two classification problems between N × N-1/2 users, which correspond to N × N-1/2 evolved neural networks, and each evolved neural network is a classifier. The training set input by each classifier is the keyboard action characteristics and the mouse action characteristics of two users, and the output is the similarity of the two users;
step 33, setting the number of input nodes of each evolutionary neural network as the number of keyboard action features and mouse action features in a training set, setting the number of hidden nodes as 0 and the number of output nodes as 2, wherein the hidden nodes respectively correspond to the similarity degrees of two users in classification, and the highest similarity degree is 1 and the lowest similarity degree is 0; setting the Fitness function Fitness of each evolutionary neural network as:
Fitness=fitness-(output[0]-xo[0])**2-(output[1]-xo[1])**2
wherein, the fitness is a numerical value expected to be reached, is the number of samples of the training set, i.e. 2, the output is the output corresponding to each sample of the training set input by the current evolutionary neural network, and the xo is the expected output; the larger the Fitness is, the higher the recognition success rate of the evolutionary neural network is, and when the output of each sample in the training set is consistent with the expected output, the Fitness reaches the maximum value;
and step 34, training the N x (N-1)/2 evolutionary neural networks to obtain N x (N-1)/2 evolutionary neural network models with the highest user classification success rate one by one, and taking the models as user identification models.
(4) User identification:
step 41, collecting keyboard metadata and mouse metadata of a user to be identified by using the method in step 1;
step 42, extracting the keyboard action characteristics and the mouse action characteristics of the user to be identified by using the method in the step 2;
and 43, inputting the keyboard action characteristics and the mouse action characteristics of the user to be identified into the N (N-1)/2 evolutionary neural network models obtained in the step 3, taking the similarity output by each evolutionary neural network model as the voting scores of the user, and correspondingly adding the voting scores according to the users to obtain the user with the highest score, namely the user to be identified.
As can be seen from the above, the present invention has the following beneficial effects:
1. the invention has high recognition rate and more comprehensive data sources: the invention combines the keyboard action characteristic and the mouse action characteristic, has more comprehensive data sources than a method only using a single characteristic, and has higher recognition rate than the traditional SVM and neural network algorithm after training by using a NEAT algorithm.
2. The invention is not influenced by the user operation environment: after using the direction, curvature angle, and curvature distance features of the mouse action, it is relatively platform independent since the direction and curvature angle are not based on screen size or other elements of the user's environment. This angle-based metric is relatively stable on different platforms and is not affected by the user's operating environment.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1.一种在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,包括如下步骤:1. a method for using keyboard and mouse data to identify a user under uncontrollable environment, is characterized in that, comprises the steps: 步骤1,数据采集:在计算机中部署键盘动作采集程序和鼠标动作采集程序,采集计算机日常操作中的键盘元数据和鼠标元数据;Step 1, data collection: deploy the keyboard action collection program and the mouse action collection program in the computer, and collect the keyboard metadata and mouse metadata in the daily operation of the computer; 步骤2,特征提取:从采集的键盘元数据和鼠标元数据中,提取键盘动作特征和鼠标动作特征;Step 2, feature extraction: from the collected keyboard metadata and mouse metadata, extract keyboard action features and mouse action features; 步骤3,模型培训:利用提取的键盘动作特征和鼠标动作特征在每个用户之间训练一个增强拓扑的进化神经网络,得到用户识别模型;Step 3, model training: use the extracted keyboard action features and mouse action features to train an evolutionary neural network with enhanced topology between each user to obtain a user identification model; 步骤4,用户识别:利用用户识别模型对待识别用户进行识别。Step 4, user identification: use the user identification model to identify the user to be identified. 2.根据权利要求1所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,步骤1的方法为:在计算机中部署的键盘动作采集程序和鼠标动作采集程序通过钩子链表采集计算机日常操作中的键盘元数据和鼠标元数据;所述键盘元数据包括按键名称、按键按下时间戳和按键释放时间戳;所述鼠标元数据包括操作类型、时间戳以及鼠标指针位置的x和y坐标。2. the method for using keyboard and mouse data to identify a user under uncontrollable environment according to claim 1, is characterized in that, the method of step 1 is: the keyboard motion acquisition program and the mouse motion acquisition program deployed in the computer pass the hook The linked list collects keyboard metadata and mouse metadata in the daily operation of the computer; the keyboard metadata includes key names, key press timestamps and key release timestamps; the mouse metadata includes operation type, timestamp and mouse pointer position the x and y coordinates. 3.根据权利要求2所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,步骤2中提取键盘动作特征的方法为:3. the method for using keyboard and mouse data to identify users under uncontrollable environment according to claim 2, is characterized in that, the method for extracting keyboard action feature in step 2 is: 步骤211,将键盘元数据分割为时间窗口t1的键盘数据集;Step 211, dividing the keyboard metadata into keyboard data sets of time window t1; 步骤212,使用键盘数据集计算每个按键的持续时间,以及前一个按键与后一个按键的延迟时间;其中,所述每个按键的持续时间为时间窗口t1内每个按键的每次按下时间的平均值,按下时间为每个按键的按键释放时间戳减与按键按下时间戳的差值;所述延迟时间为后一个按键的按键按下时间戳与前一个按键的按键释放时间戳减的差值;Step 212, use the keyboard data set to calculate the duration of each key, and the delay time of the previous key and the next key; wherein, the duration of each key is the time of each key press in the time window t1. The average value of the time, the pressing time is the difference between the time stamp of the key release of each key minus the time stamp of the key press; the delay time is the time stamp of the key press of the next key and the key release time of the previous key The difference of stamp minus; 步骤213,键盘数据集中每个按键的持续时间为持续时间特征,前一个按键与后一个按键的延迟时间为延迟时间特征,则所述键盘动作特征包括k个持续时间特征和k2个延迟时间特征,其中,k为计算机使用的键盘的按键个数。Step 213, the duration of each key in the keyboard data set is the duration feature, and the delay time of the previous key and the next key is the delay time feature, then the keyboard action feature includes k duration features and k 2 delay times. feature, where k is the number of keys on the keyboard used by the computer. 4.根据权利要求3所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,在提取键盘动作特征前,设置延迟阈值T1,当用户停止使用键盘的时间超过延迟阈值T1时,则在出现该延迟时分割出一个新的键盘数据序列,用于去除键盘动作特征中的数据空白。4. the method for using keyboard and mouse data to identify users under uncontrollable environment according to claim 3, it is characterized in that, before extracting keyboard action feature, set delay threshold T1, when the time that user stops using keyboard exceeds delay threshold At T1, when the delay occurs, a new keyboard data sequence is divided to remove the data blank in the keyboard action feature. 5.根据权利要求1所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,步骤2中提取鼠标动作特征的方法为:5. the method for using keyboard and mouse data to identify users under uncontrollable environment according to claim 1, is characterized in that, the method for extracting mouse action feature in step 2 is: 步骤221,将鼠标元数据分割为时间窗口t2的鼠标数据集;Step 221, dividing mouse metadata into mouse data sets of time window t2; 步骤222,利用鼠标数据集计算鼠标动作的方向、曲率角和曲率距离:Step 222, using the mouse data set to calculate the direction, curvature angle and curvature distance of the mouse action: (1)方向:对于任意连续的两个点B和C,点B到点C沿直线行驶,该直线与水平线之间的夹角即为鼠标动作的方向;(1) Direction: For any two consecutive points B and C, point B to point C travels along a straight line, and the angle between the straight line and the horizontal line is the direction of the mouse action; (2)曲率角:对于任意连续的三个点A、B和C,点A到点B的直线与点B到点C的直线之间的夹角即为曲率角;(2) Angle of curvature: For any three consecutive points A, B and C, the angle between the straight line from point A to point B and the straight line from point B to point C is the curvature angle; (3)曲率距离:对于任意连续的三个点A、B和C,点B的曲率距离是从点B到点A与点C之间的直线的垂直距离与点A到点C的直线距离之比;(3) Curvature distance: For any three consecutive points A, B and C, the curvature distance of point B is the vertical distance from point B to the straight line between point A and point C and the straight line distance from point A to point C Ratio; 步骤223,分别计算出鼠标数据集中鼠标动作的方向、曲率角和曲率距离的累积分布函数作为鼠标动作特征。Step 223: Calculate the cumulative distribution functions of the mouse action direction, the curvature angle and the curvature distance in the mouse data set, respectively, as the mouse action feature. 6.根据权利要求3所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,在提取鼠标动作特征前,设置延迟阈值T2,当用户停止使用鼠标的时间超过延迟阈值T2时,则在出现该延迟时分割出一个新的鼠标数据序列,用于去除鼠标动作特征中的数据空白。6. the method for using keyboard and mouse data to identify the user under uncontrollable environment according to claim 3, it is characterized in that, before extracting mouse action feature, set delay threshold T2, when the time that user stops using mouse exceeds delay threshold At T2, when the delay occurs, a new mouse data sequence is segmented to remove the data blank in the mouse action feature. 7.根据权利要求3所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,步骤3的方法为:7. the method for using keyboard and mouse data to identify users under uncontrollable environment according to claim 3, is characterized in that, the method of step 3 is: 步骤31,对提取的键盘动作特征和鼠标动作特征进行z-score归一化处理,使输入的键盘动作特征和鼠标动作特征的平均值为0,方差为1;并在归一化处理后,将所有键盘动作特征和鼠标动作特征分为训练集和验证集;Step 31, carry out z-score normalization processing to the extracted keyboard action feature and mouse action feature, so that the average value of the input keyboard action feature and mouse action feature is 0, and the variance is 1; and after the normalization process, Divide all keyboard action features and mouse action features into training set and validation set; 步骤32,使用一对一方法将N个用户的多分类问题分割为N*(N-1)/2个两个用户之间的二分类问题,每个二分类问题对应一个进化神经网络;Step 32, using a one-to-one method to divide the multi-classification problem of N users into N*(N-1)/2 binary classification problems between two users, and each binary classification problem corresponds to an evolutionary neural network; 步骤33,设置每个进化神经网络的输入节点数为训练集中的键盘动作特征和鼠标动作特征个数,隐藏节点数为0,输出节点为2,分别对应正在分类的两个用户的相似程度,相似程度最高为1,最低为0;设置每个进化神经网络的适应度函数Fitness为:Step 33, the number of input nodes of each evolutionary neural network is set as the number of keyboard action features and mouse action features in the training set, the number of hidden nodes is 0, and the number of output nodes is 2, respectively corresponding to the similarity of the two users being classified, The highest similarity is 1 and the lowest is 0; the fitness function Fitness of each evolutionary neural network is set as: Fitness=fitness-(output[0]-xo[0])**2-(output[1]-xo[1])**2Fitness=fitness-(output[0]-xo[0])**2-(output[1]-xo[1])**2 其中,fitness为期望达到的数值,为训练集的样本个数*2,output为当前进化神经网络输入的训练集的每个样本对应的输出,xo为预期输出;Fitness越大表示这个进化神经网络的识别成功率越高,当训练集中每个样本的输出与预期输出一致时,Fitness达到最大值;Among them, fitness is the expected value, the number of samples in the training set*2, output is the output corresponding to each sample in the training set input by the current evolutionary neural network, and xo is the expected output; the larger the fitness, the higher the evolutionary neural network. The higher the recognition success rate is, when the output of each sample in the training set is consistent with the expected output, the Fitness reaches its maximum value; 步骤34,对N*(N-1)/2个进化神经网络进行训练得出N*(N-1)/2个一对一用户分类成功率最高的进化神经网络模型,以此作为用户识别模型。Step 34: N*(N-1)/2 evolutionary neural networks are trained to obtain N*(N-1)/2 evolutionary neural network models with the highest success rate of one-to-one user classification, which are used as user identification. Model. 8.根据权利要求7所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,步骤31中在归一化处理后,将所有键盘动作特征和鼠标动作特征中的70%作为训练集,30%作为验证集。8. the method for using keyboard and mouse data to identify users under uncontrollable environment according to claim 7, is characterized in that, after normalization processing in step 31, 70 in all keyboard action features and mouse action features % as the training set and 30% as the validation set. 9.根据权利要求7所述的在不可控环境下使用键盘和鼠标数据识别用户的方法,其特征在于,步骤4的方法为:9. the method for using keyboard and mouse data to identify users under uncontrollable environment according to claim 7, is characterized in that, the method of step 4 is: 步骤41,利用步骤1的方法采集待识别用户的键盘元数据和鼠标元数据;Step 41, utilize the method of step 1 to collect the keyboard metadata and mouse metadata of the user to be identified; 步骤42,利用步骤2的方法提取待识别用户的键盘动作特征和鼠标动作特征;Step 42, utilize the method of step 2 to extract the keyboard action feature and mouse action feature of the user to be identified; 步骤43,将待识别用户的键盘动作特征和鼠标动作特征输入步骤3得到的N*(N-1)/2个进化神经网络模型,将每个进化神经网络模型输出的相似度作为用户的投票分数并根据用户对应相加,得出分数最高的用户即为待识别用户。Step 43, input the N*(N-1)/2 evolutionary neural network models obtained in step 3 with the keyboard action features and mouse action features of the user to be identified, and use the similarity output of each evolutionary neural network model as the user's vote The scores are added according to the corresponding users, and the user with the highest score is the user to be identified.
CN201911291751.2A 2019-12-16 2019-12-16 A method for identifying users using keyboard and mouse data in an uncontrolled environment Active CN111124860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911291751.2A CN111124860B (en) 2019-12-16 2019-12-16 A method for identifying users using keyboard and mouse data in an uncontrolled environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911291751.2A CN111124860B (en) 2019-12-16 2019-12-16 A method for identifying users using keyboard and mouse data in an uncontrolled environment

Publications (2)

Publication Number Publication Date
CN111124860A true CN111124860A (en) 2020-05-08
CN111124860B CN111124860B (en) 2021-04-27

Family

ID=70499002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911291751.2A Active CN111124860B (en) 2019-12-16 2019-12-16 A method for identifying users using keyboard and mouse data in an uncontrolled environment

Country Status (1)

Country Link
CN (1) CN111124860B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269937A (en) * 2020-11-16 2021-01-26 加和(北京)信息科技有限公司 Method, system and device for calculating user similarity
CN116633586A (en) * 2023-04-07 2023-08-22 北京胜博雅义网络科技有限公司 Identification authentication analysis system based on Internet of things

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040095384A1 (en) * 2001-12-04 2004-05-20 Applied Neural Computing Ltd. System for and method of web signature recognition system based on object map
US20070060114A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Predictive text completion for a mobile communication facility
CN104809377A (en) * 2015-04-29 2015-07-29 西安交通大学 Method for monitoring network user identity based on webpage input behavior characteristics
CN106445101A (en) * 2015-08-07 2017-02-22 飞比特公司 Method and system for identifying user
CN107423549A (en) * 2016-04-21 2017-12-01 唯亚威解决方案股份有限公司 fitness tracker
CN109871673A (en) * 2019-03-11 2019-06-11 重庆邮电大学 Method and system for continuous identity authentication based on different contexts
CN110443012A (en) * 2019-06-10 2019-11-12 中国刑事警察学院 Personal identification method based on keystroke characteristic

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040095384A1 (en) * 2001-12-04 2004-05-20 Applied Neural Computing Ltd. System for and method of web signature recognition system based on object map
US20070060114A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Predictive text completion for a mobile communication facility
CN104809377A (en) * 2015-04-29 2015-07-29 西安交通大学 Method for monitoring network user identity based on webpage input behavior characteristics
CN106445101A (en) * 2015-08-07 2017-02-22 飞比特公司 Method and system for identifying user
CN107423549A (en) * 2016-04-21 2017-12-01 唯亚威解决方案股份有限公司 fitness tracker
CN109871673A (en) * 2019-03-11 2019-06-11 重庆邮电大学 Method and system for continuous identity authentication based on different contexts
CN110443012A (en) * 2019-06-10 2019-11-12 中国刑事警察学院 Personal identification method based on keystroke characteristic

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PURVASHI BAYNATH: "Pattern representation using Neuroevolution of the augmenting topology (NEAT) on Keystroke dynamics features in Biometrics", 《IEEE》 *
王勇: "一种基于用户行为状态特征的流量识别方法", 《计算机应用研究》 *
王振辉: "基于鼠标和键盘行为特征组合的用户身份认证", 《计算机应用与软件》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269937A (en) * 2020-11-16 2021-01-26 加和(北京)信息科技有限公司 Method, system and device for calculating user similarity
CN112269937B (en) * 2020-11-16 2024-02-02 加和(北京)信息科技有限公司 Method, system and device for calculating user similarity
CN116633586A (en) * 2023-04-07 2023-08-22 北京胜博雅义网络科技有限公司 Identification authentication analysis system based on Internet of things

Also Published As

Publication number Publication date
CN111124860B (en) 2021-04-27

Similar Documents

Publication Publication Date Title
CN103530540B (en) User identity attribute detection method based on man-machine interaction behavior characteristics
Sae-Bae et al. Online signature verification on mobile devices
Monrose et al. Keystroke dynamics as a biometric for authentication
Uludag et al. Biometric template selection and update: a case study in fingerprints
CN100356388C (en) Biocharacteristics fusioned identity distinguishing and identification method
Idrus et al. Soft biometrics for keystroke dynamics: Profiling individuals while typing passwords
CN104809377B (en) Network user identity monitoring method based on webpage input behavior feature
Ansari et al. Online signature verification using segment‐level fuzzy modelling
EP2477136A1 (en) Method for continuously verifying user identity via keystroke dynamics
Roth et al. Biometric authentication via keystroke sound
Rahman et al. Making impostor pass rates meaningless: A case of snoop-forge-replay attack on continuous cyber-behavioral verification with keystrokes
CN111124860B (en) A method for identifying users using keyboard and mouse data in an uncontrolled environment
Migdal et al. Statistical modeling of keystroke dynamics samples for the generation of synthetic datasets
Kratky et al. Recognition of web users with the aid of biometric user model
Rodriguez et al. Introducing a semi‐automatic method to simulate large numbers of forensic fingermarks for research on fingerprint identification
Kochegurova et al. Aspects of continuous user identification based on free texts and hidden monitoring
Sudhish et al. Adaptive fusion of biometric and biographic information for identity de-duplication
Pelto et al. Your identity is your behavior-continuous user authentication based on machine learning and touch dynamics
Khoh et al. Score level fusion approach in dynamic signature verification based on hybrid wavelet‐Fourier transform
Poh et al. A methodology for separating sheep from goats for controlled enrollment and multimodal fusion
Nyssen et al. A multi-stage online signature verification system
JP5895751B2 (en) Biometric authentication device, retry control program, and retry control method
Yu et al. Mental workload classification via online writing features
Monaco Classification and authentication of one-dimensional behavioral biometrics
KR20120134484A (en) Method and apparatus for user authentication based on keystroke dynamics pattern data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant