CN117974740A

CN117974740A - Acupoint positioning method and robot based on aggregation type window self-attention mechanism

Info

Publication number: CN117974740A
Application number: CN202410385415.9A
Authority: CN
Inventors: 侯家睿; 谢非; 王军; 李艺钧; 张弘毅; 马磊; 周正亚; 许嘉炜; 王凯琳; 马宝萍
Original assignee: Nanjing Sanwan Iot Technology Co ltd; Nanjing Normal University
Current assignee: Nanjing Sanwan Iot Technology Co ltd; Nanjing Normal University
Priority date: 2024-04-01
Filing date: 2024-04-01
Publication date: 2024-05-03
Anticipated expiration: 2044-04-01
Also published as: CN117974740B

Abstract

The invention discloses an acupoint positioning method based on an aggregated window self-attention mechanism. First, a human back image is obtained and preprocessed. The preprocessed back image is input into a multi-scale feature extraction network to obtain a back feature map. The multi-scale feature extraction network includes several feature map extraction modules. The preprocessed back image is processed by several feature map extraction modules in turn. Each feature map extraction module includes an aggregated window self-attention learning layer. The back feature map is input into a back feature key point detection network to perform back feature key point detection to obtain back feature key points. The specific coordinates of the back acupoints are obtained through a back acupoint positioning formula to achieve back acupoint positioning. In obtaining the features of the back feature key points of the human body, the aggregated window self-attention learning method is used for the position of the back where the key points are located, so that the key point positions of the back of human bodies of different body shapes can be determined more accurately and quickly.

Description

Acupoint localization method and robot based on aggregated window self-attention mechanism

技术领域Technical Field

本发明涉及理疗机器人，具体是涉及一种基于聚合式窗口自注意力机制的穴位定位方法及机器人。The present invention relates to a physiotherapy robot, and in particular to an acupoint positioning method and a robot based on a clustered window self-attention mechanism.

背景技术Background technique

穴位是医学的重要组成部分，通过按摩穴位可以防病治病，改善身体健康。医师通常以自身长时间的按摩经验进行取穴按穴，但这种方法需要医师有着长时间的取穴经验，而且中医按摩的临床医生存在着学习人数少，医术不精等问题。近些年，随着计算机视觉技术与机器人的发展，一些科研工作者提出基于机器视觉与图像处理的方法对穴位进行定位，并应用在机器人上，实现取穴按穴智能化。Acupoints are an important part of medicine. Massaging acupoints can prevent and cure diseases and improve health. Physicians usually use their long-term massage experience to select and press acupoints, but this method requires physicians to have long-term experience in selecting acupoints, and there are problems such as a small number of clinical doctors who learn Chinese massage and poor medical skills. In recent years, with the development of computer vision technology and robots, some researchers have proposed methods based on machine vision and image processing to locate acupoints, and applied them to robots to realize intelligent acupoint selection and pressing.

然而，当今大多数基于视觉定位的按摩机器人往往对工作环境要求很高，要求按摩背景单一，对复杂背景不具有普遍性且定位精度低，在智能程度上与专业按摩师还相差甚远。However, most of today's massage robots based on visual positioning often have very high requirements for the working environment, require a single massage background, are not universal for complex backgrounds, and have low positioning accuracy. They are still far from professional masseurs in terms of intelligence.

发明内容Summary of the invention

发明目的：针对以上缺点，本发明提供一种定位精度高的基于聚合式窗口自注意力机制的穴位定位方法及机器人。Purpose of the invention: In view of the above shortcomings, the present invention provides an acupoint positioning method and robot based on a clustered window self-attention mechanism with high positioning accuracy.

技术方案：为解决上述问题，本发明采用一种基于聚合式窗口自注意力机制的穴位定位方法，包括以下步骤：Technical solution: To solve the above problems, the present invention adopts an acupoint positioning method based on an aggregated window self-attention mechanism, comprising the following steps:

（1）获取人体背部图像，并进行预处理，获得预处理后的背部图像；(1) Acquire a human back image and perform preprocessing to obtain a preprocessed back image;

（2）将预处理后的背部图像输入多尺度特征提取网络，得到背部特征图；所述多尺度特征提取网络包括若干特征图提取模块，预处理后的背部图像依次经过若干特征图提取模块的处理，每个特征图提取模块均包括聚合式窗口自注意力学习层，聚合式窗口自注意力学习层用于强化提取的特征图中包含特征关键点像素的特征学习；(2) inputting the preprocessed back image into a multi-scale feature extraction network to obtain a back feature map; the multi-scale feature extraction network includes a plurality of feature map extraction modules, the preprocessed back image is sequentially processed by the plurality of feature map extraction modules, each feature map extraction module includes an aggregated window self-attention learning layer, and the aggregated window self-attention learning layer is used to strengthen feature learning of pixels containing feature key points in the extracted feature map;

（3）将背部特征图输入背部特征关键点检测网络进行背部特征关键点检测，得到背部特征关键点；(3) Inputting the back feature map into the back feature key point detection network to detect the back feature key points and obtain the back feature key points;

（4）根据背部特征关键点，通过背部穴位定位公式得到背部穴位具体坐标，实现背部穴位定位。(4) Based on the key points of the back features, the specific coordinates of the back acupoints are obtained through the back acupoint positioning formula to achieve back acupoint positioning.

进一步的，对人体背部图像进行预处理包括：Further, the preprocessing of the human back image includes:

获取人体背部图像的深度信息；Obtaining depth information of the human back image;

将深度信息通过深度信息融合公式与人体背部图像融合，得到可视深度信息图；The depth information is fused with the human back image through the depth information fusion formula to obtain a visual depth information map;

以人体背部图像的二值化图像作为掩膜，补偿可视深度信息图的缺失，得到预处理后的背部图像。The binary image of the human back is used as a mask to compensate for the lack of visual depth information map and obtain the preprocessed back image.

进一步的，所述获取人体背部图像的深度信息具体包括：获取人体背部图像的视差图，将视差图通过深度转换公式转换为深度图从而获取深度信息，深度信息计算公式为：Further, the obtaining of the depth information of the human back image specifically includes: obtaining a disparity map of the human back image, converting the disparity map into a depth map by a depth conversion formula to obtain the depth information, and the depth information calculation formula is:

； ;

其中，为深度信息，/>为视差，/>为基线长度，/>为焦距，/>、/>分别为深度双目相机的左右相机所对应视差图中相同位置的横坐标。in, is the depth information, /> is parallax, /> is the baseline length, /> is the focal length, /> 、/> They are the horizontal coordinates of the same positions in the disparity map corresponding to the left and right cameras of the depth binocular camera.

进一步的，所述深度信息融合公式为：Furthermore, the depth information fusion formula is:

； ;

其中，为人体背部图像中第i个像素的深度信息，/>为人体背部图像中第i个像素的横坐标，/>为人体背部图像中第i个像素的纵坐标，/>为相机光轴与成像平面交点的横坐标，/>为相机光轴与成像平面交点的纵坐标，/>、/>、/>为可视深度信息图中第i个像素的融合点坐标，/>、/>分别为/>轴方向和/>轴方向上单个像素对应的长度。in, is the depth information of the i-th pixel in the human back image, /> is the horizontal coordinate of the ith pixel in the human back image, /> is the ordinate of the i-th pixel in the human back image, /> is the horizontal coordinate of the intersection of the camera optical axis and the imaging plane, /> is the ordinate of the intersection of the camera optical axis and the imaging plane, /> 、/> 、/> is the fusion point coordinate of the i-th pixel in the visual depth information map, /> 、/> They are respectively/> Axis direction and /> The length of a single pixel in the axis direction.

进一步的，所述预处理后的背部图像由人体背部图像的二值化图像/>和可视深度信息图/>用二值程度数/>随机化融合得到，融合公式为：Further, the preprocessed back image is composed of a human back image Binarized image of and visual depth information map/> Use binary degree number/> The random fusion is obtained, and the fusion formula is:

； ;

其中，为回归奖励模型，/>为一个无限不循环小数，其值约等于，/>为奖励程度。in, is the regression reward model,/> is an infinite non-repeating decimal, approximately equal to ,/> For the degree of reward.

进一步的，所述多尺度特征提取网络包括一级特征图提取模块、二级特征图提取模块、三级特征图提取模块、四级特征图提取模块；预处理后的背部图像输入一级特征图提取模块，输出一级特征图；所述一级特征图提取模块包括补丁分区变换层和第一个聚合式窗口自注意力学习层；所述补丁分区变换层包括对预处理后的背部图像进行补丁裁剪和深度压缩；Furthermore, the multi-scale feature extraction network includes a primary feature map extraction module, a secondary feature map extraction module, a tertiary feature map extraction module, and a quaternary feature map extraction module; the preprocessed back image is input into the primary feature map extraction module, and a primary feature map is output; the primary feature map extraction module includes a patch partition transformation layer and a first aggregated window self-attention learning layer; the patch partition transformation layer includes patch cropping and depth compression of the preprocessed back image;

一级特征图输入二级特征图提取模块，输出二级特征图，所述二级特征图提取模块包括第一个补丁合并层和第二个聚合式窗口自注意力学习层；The primary feature map is input into a secondary feature map extraction module, and a secondary feature map is output, wherein the secondary feature map extraction module includes a first patch merging layer and a second aggregated window self-attention learning layer;

二级特征图输入三级特征图提取模块，输出三级特征图，所述三级特征图提取模块包括第二个补丁合并层和第三个聚合式窗口自注意力学习层；The secondary feature map is input into a third-level feature map extraction module, and a third-level feature map is output. The third-level feature map extraction module includes a second patch merging layer and a third aggregated window self-attention learning layer;

三级特征图输入四级特征图提取模块，输出四级特征图，即背部特征图，所述四级特征图提取模块包括第三个补丁合并层和第四个聚合式窗口自注意力学习层；所述第一个补丁合并层、第二个补丁合并层、第三个补丁合并层用于对特征图进行非卷积式下采样。The three-level feature map is input into a four-level feature map extraction module, and a four-level feature map, i.e., a back feature map, is output. The four-level feature map extraction module includes a third patch merging layer and a fourth aggregated window self-attention learning layer; the first patch merging layer, the second patch merging layer, and the third patch merging layer are used to perform non-convolutional downsampling on the feature map.

进一步的，所述第一个聚合式窗口自注意力学习层、第二个聚合式窗口自注意力学习层、第三个聚合式窗口自注意力学习层、第四个聚合式窗口自注意力学习层均包括以下步骤：Furthermore, the first aggregated window self-attention learning layer, the second aggregated window self-attention learning layer, the third aggregated window self-attention learning layer, and the fourth aggregated window self-attention learning layer all include the following steps:

将输入的特征图进行窗口划分，得到若干个窗口；Divide the input feature map into windows to obtain several windows;

单独对每个窗口内部进行自注意力学习；Perform self-attention learning on each window separately;

将自注意力学习后的各个窗口向四个顶角偏移，形成各个窗口顶角上的聚合窗口；Shift each window after self-attention learning to the four corners to form an aggregate window on the corners of each window;

对聚合窗口内部进行聚合自注意力学习，输出强化后的特征图。Aggregate self-attention learning is performed inside the aggregation window, and the enhanced feature map is output.

进一步的，所述自注意力学习为将单通道特征图中的每个像素的信息所对应的向量作为输入向量，通过权重矩阵生成对应的信息矩阵，并将信息矩阵输入自注意力学习函数；Furthermore, the self-attention learning is to use the vector corresponding to the information of each pixel in the single-channel feature map as the input vector, generate the corresponding information matrix through the weight matrix, and input the information matrix into the self-attention learning function;

所述聚合自注意力学习过程公式为：The formula for the aggregated self-attention learning process is:

； ;

其中，为聚合窗口左上角第/>行、第/>列聚合窗口像素的特征矩阵；/>与为向窗口内部聚合的像素的特征矩阵；/>为特征矩阵/>中的背部特征量，/>为特征矩阵/>中的背部特征量；in, The top left corner of the aggregation window/> Row, No./> The feature matrix of the column-aggregated window pixels; /> and is the feature matrix of pixels aggregated inside the window; /> is the feature matrix/> The back feature quantity in ,/> is the feature matrix/> The back feature quantity in ;

当出现时，则该聚合窗口只存在背部区域特征，不存在背景区域特征，则筛除该聚合窗口；When it appears When , the aggregation window only has back region features and no background region features, then the aggregation window is screened out;

当满足特征矩阵的背部特征量由/>增长为/>且为四个聚合窗口特征量中的唯一最大值时，则确定该聚合窗口是四个聚合窗口中最有可能存在背部特征关键点的区域。When the characteristic matrix The back feature quantity of is given by/> Growth to/> And when it is the only maximum value among the feature quantities of the four aggregation windows, it is determined that the aggregation window is the area in the four aggregation windows that is most likely to have back feature key points.

进一步的，所述背部特征关键点检测网络包括全局池化层和分类器；全局池化层将背部特征图进行全局平均池化，背部特征图被压缩成固定长度的特征向量，分类器将特征向量输入到全连接层，得到背部特征关键点的预测标签及坐标。Furthermore, the back feature key point detection network includes a global pooling layer and a classifier; the global pooling layer performs global average pooling on the back feature map, and the back feature map is compressed into a feature vector of a fixed length. The classifier inputs the feature vector into the fully connected layer to obtain the predicted labels and coordinates of the back feature key points.

本发明还采用一种应用上述穴位定位方法的机器人，包括图像采集模块、处理模块、控制模块、末端执行器；The present invention also adopts a robot applying the above-mentioned acupoint positioning method, comprising an image acquisition module, a processing module, a control module, and an end effector;

所述图像采集模块，用于获取人体背部图像；The image acquisition module is used to acquire the image of the back of the human body;

所述处理模块，用于对获取人体背部图像进行预处理，获得预处理后的背部图像；将预处理后的背部图像输入多尺度特征提取网络，得到背部特征图；将背部特征图输入背部特征关键点检测网络进行背部特征关键点检测，得到背部特征关键点；根据背部特征关键点，通过背部穴位定位公式得到背部穴位具体坐标，实现背部穴位定位；The processing module is used to preprocess the acquired human back image to obtain a preprocessed back image; input the preprocessed back image into a multi-scale feature extraction network to obtain a back feature map; input the back feature map into a back feature key point detection network to perform back feature key point detection to obtain back feature key points; according to the back feature key points, the specific coordinates of the back acupoints are obtained by the back acupoint positioning formula to realize back acupoint positioning;

所述控制模块，用于控制末端执行器对定位的背部穴位进行按揉式按摩。The control module is used to control the end effector to perform kneading massage on the located acupuncture points on the back.

有益效果：本发明相对于现有技术，其显著优点是在获取人体背部特征关键点特征中，针对关键点所处背部的位置，采用聚合式窗口自注意力学习方法，可以更加精准、快速的确定不同体型的人体背部的关键点位置。背部穴位定位公式实现自动找穴，以解决现有的大多数背部穴位定位方法对于不同体型的人体需要重新进行找穴，未对穴位进行自动定位，且穴位定位精度低、工作量大等问题。在检测人体背部特征关键点中，以原始RGB图像的二值化图像作为掩膜，补偿深度图像在人体背部轮廓上的缺失，可以更好地划分人体背部区域，利于背部特征关键点的检测。Beneficial effect: Compared with the prior art, the significant advantage of the present invention is that in acquiring the key features of the back of the human body, the position of the key points on the back where the key points are located is determined more accurately and quickly by using a clustered window self-attention learning method. The back acupuncture point positioning formula realizes automatic acupuncture point finding to solve the problems that most existing back acupuncture point positioning methods need to re-find acupuncture points for people of different body shapes, do not automatically locate acupuncture points, and have low acupuncture point positioning accuracy and large workload. In detecting the key features of the back of the human body, the binary image of the original RGB image is used as a mask to compensate for the lack of the depth image on the contour of the back of the human body, which can better divide the back area of the human body and facilitate the detection of the key features of the back.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明穴位定位方法的工作流程示意图。FIG1 is a schematic diagram of the workflow of the acupoint locating method of the present invention.

图2是本发明中相机采集的人体背部图像。FIG. 2 is an image of a human back captured by a camera in the present invention.

图3是本发明中在人体背部图像上可视化深度信息得到的可视深度信息图。FIG. 3 is a visual depth information diagram obtained by visualizing depth information on a human back image in the present invention.

图4是本发明中可视深度信息图经过轮廓补偿后得到的深度补偿图。FIG. 4 is a depth compensation map obtained after contour compensation of the visible depth information map in the present invention.

图5是采用现有技术未经过聚合式窗口自注意力学习得到的人体背部特征关键点检测结果。FIG. 5 is a result of detecting key feature points on the back of a human body using the prior art without aggregated window self-attention learning.

图6是采用本发明经过聚合式窗口自注意力学习得到的人体背部特征关键点检测结果。FIG6 is a result of detecting key feature points of the back of a human body obtained by using the present invention through aggregated window self-attention learning.

具体实施方式Detailed ways

在本实施例中，由RealSense D435i深度双目相机采集人体背部RGB图像，通过上位机的软件对图像进行处理，利用液晶显示屏进行处理结果显示，并将处理结果输入进机器人处理器进行按揉式按摩。本方法可应用于人体穴位定位及机器人自主按摩等领域。In this embodiment, the RealSense D435i deep binocular camera collects RGB images of the human back, processes the images through the software of the host computer, displays the processing results on the LCD screen, and inputs the processing results into the robot processor for kneading massage. This method can be applied to the fields of human acupoint positioning and robot autonomous massage.

如图1所示，本实施例中一种基于聚合式窗口自注意力机制的穴位定位方法，包括以下步骤：As shown in FIG1 , in this embodiment, an acupoint positioning method based on an aggregated window self-attention mechanism includes the following steps:

（1）机器人利用深度双目相机采集人体背部RGB图像并进行预处理；(1) The robot uses a deep binocular camera to collect RGB images of the human back and perform preprocessing;

（2）预处理后的背部图像输入多尺度特征提取网络，依次经过网络中的一级特征图提取模块、二级特征图提取模块、三级特征图提取模块、四级特征图提取模块后最终输出四级特征图；(2) The preprocessed back image is input into the multi-scale feature extraction network, and passes through the first-level feature map extraction module, the second-level feature map extraction module, the third-level feature map extraction module, and the fourth-level feature map extraction module in the network in sequence, and finally outputs a fourth-level feature map;

（3）将四级特征图输入背部特征关键点检测网络对预处理后的背部图像进行背部特征关键点检测，并通过背部穴位定位公式得到背部穴位具体坐标。(3) The four-level feature map is input into the back feature key point detection network to detect the back feature key points of the preprocessed back image, and the specific coordinates of the back acupoints are obtained through the back acupoint positioning formula.

下面结合附图和具体实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

（1）采集的人体背部RGB图像如图2所示，以下所有步骤中提到的人体背部RGB图像、原始人体背部RGB图像、原始RGB图像皆指代图2中的图像。(1) The collected RGB image of the human back is shown in FIG2 . The human back RGB image, original human back RGB image, and original RGB image mentioned in all the following steps refer to the image in FIG2 .

对人体背部RGB图像进行预处理：为更好的将人体背部区域与背景区域分割，采用提取原始RGB图像中的深度信息并将深度信息通过深度信息融合公式与原始图像进行融合，得到可视深度信息图，并以原始RGB图像的二值化图像作为掩膜，补偿深度图像在人体背部轮廓上的缺失获得补偿深度图，具体包括：Preprocessing of the RGB image of the human back: In order to better segment the human back area from the background area, the depth information in the original RGB image is extracted and fused with the original image through the depth information fusion formula to obtain a visual depth information map. The binary image of the original RGB image is used as a mask to compensate for the lack of the depth image on the human back contour to obtain a compensated depth map. Specifically, the following steps are performed:

（2.1）在深度双目相机采集到的原始RGB图像的基础上获取其对应的视差图，将视差图通过深度转换公式转换为深度图从而获取深度信息，深度计算公式为：(2.1) Based on the original RGB image collected by the deep binocular camera, the corresponding disparity map is obtained, and the disparity map is converted into a depth map through the depth conversion formula to obtain the depth information. The depth calculation formula is:

； ;

将深度图的深度信息融合进原始RGB图像中获得可视深度信息图，如图3所示，计算公式为：The depth information of the depth map is fused into the original RGB image to obtain a visual depth information map, as shown in Figure 3. The calculation formula is:

； ;

其中，为原始人体背部RGB图像中第i个像素的深度信息，/>为原始人体背部RGB图像中第i个像素的横坐标，/>为原始人体背部RGB图像中第i个像素的纵坐标，/>为相机光轴与成像平面交点的横坐标，/>为相机光轴与成像平面交点的纵坐标，/>、/>、/>为可视深度信息图中第i个像素的融合点坐标，/>、/>分别为/>轴方向和/>轴方向上单个像素对应的长度。in, is the depth information of the i-th pixel in the original human back RGB image, /> is the horizontal coordinate of the i-th pixel in the original human back RGB image, /> is the ordinate of the i-th pixel in the original human back RGB image, /> is the horizontal coordinate of the intersection of the camera optical axis and the imaging plane, /> is the ordinate of the intersection of the camera optical axis and the imaging plane, /> 、/> 、/> is the fusion point coordinate of the i-th pixel in the visual depth information map, /> 、/> They are respectively/> Axis direction and /> The length of a single pixel in the axis direction.

（2.3）在深度图基础上，首先将原始RGB图像二值化，获得二值化图像并以此图像作为掩膜，补偿深度图像在人体背部轮廓上的缺失。具体而言，补偿深度图/>是由二值化图像/>和可视深度信息图/>用任意的二值程度数/>随机化融合得到：(2.3) Based on the depth map, the original RGB image is first binarized to obtain a binary image This image is used as a mask to compensate for the lack of depth image on the back contour of the human body. Specifically, the depth image is compensated It is a binary image/> and visual depth information map/> Use any binary degree number/> Random fusion gives:

； ;

其中，，/>为将要被补偿的第/>行、第/>列的像素，/>；in, ,/> To be compensated for the Row, No./> Column pixels, /> ;

对于轮廓补偿过程出现扰动，即二值化图像与可视深度信息图/>在补偿过程中无法对齐的现象，使用/>回归奖励模型进行精细化处理，细化由/>指定的/>区域。通过以上操作，/>回归奖励模型可以集中于校正补偿深度图/>每个区域中的深度值，并且去除扰动后可以完全保留二值掩膜的边界，不会因为去除干扰而导致边界受损。如图4所示，为可视深度信息图经过补偿后得到的深度补偿图。For the contour compensation process, disturbance occurs, that is, the binary image With visual depth information map /> In the case of misalignment during compensation, use /> The regression reward model is refined, and the refinement is done by/> Specified /> Area. Through the above operations, /> The regression reward model can focus on correcting the compensated depth map/> The depth value in each region can be completely retained after the disturbance is removed, and the boundary of the binary mask will not be damaged due to the removal of the disturbance. As shown in Figure 4, the depth compensation map is obtained after the visual depth information map is compensated.

（3）补偿后得到的深度补偿图输入多尺度特征提取网络，依次经过网络中的一级特征图提取模块、二级特征图提取模块、三级特征图提取模块、四级特征图提取模块后最终得到四级特征图包括：(3) The depth compensation map obtained after compensation is input into the multi-scale feature extraction network, and passes through the first-level feature map extraction module, the second-level feature map extraction module, the third-level feature map extraction module, and the fourth-level feature map extraction module in the network in turn to finally obtain the fourth-level feature map including:

（3.1）一级特征图提取模块：将预处理后的背部图像依次输入补丁分区变换层、第一个聚合式窗口自注意力学习层，输出一级特征图；(3.1) First-level feature map extraction module: The preprocessed back image is sequentially input into the patch partitioning transformation layer and the first aggregated window self-attention learning layer, and the first-level feature map is output;

补丁分区变换层包括：The patch partition transformation layer includes:

将预处理后的背部图像先进行补丁裁剪，即将预处理图像中每4×4个相邻的像素划为一个补丁，在颜色通道方向展平，由于每个补丁有16个像素，输入的是RGB三通道图像，每个像素有R、G、B三个值，所以展平后是16×3=48，图像大小由变成了，其中，/>为预处理后的图像高度，/>为预处理后的图像宽度，48为预处理后的图像深度；然后再对每个像素的通道数据做线性变换，将预处理后的图像深度由48压缩为/>，即预处理后的图像大小再由/>变成了/>，/>为压缩后的图像深度，可以进行人为设置。The preprocessed back image is first patch-cropped, that is, every 4×4 adjacent pixels in the preprocessed image are divided into a patch, and flattened in the color channel direction. Since each patch has 16 pixels, the input is an RGB three-channel image, and each pixel has three values of R, G, and B, so the flattened size is 16×3=48, and the image size is given by became , where /> is the height of the preprocessed image, /> is the width of the preprocessed image, and 48 is the depth of the preprocessed image; then a linear transformation is performed on the channel data of each pixel to compress the depth of the preprocessed image from 48 to /> , that is, the size of the preprocessed image is then determined by/> Became/> ,/> The depth of the compressed image can be set manually.

第一个聚合式窗口自注意力学习层包括：The first aggregated windowed self-attention learning layer consists of:

将经过补丁分区变换层处理过的预处理图像以4×4大小划分成个窗口，然后单独对/>个窗口内部进行自注意力学习；单独的/>个窗口学习后将各个窗口向四个顶角偏移临近的2个像素，形成各窗口顶角上的聚合窗口，再单独对聚合窗口内部进行聚合自注意力学习，强化初始特征图中可能包含背部特征关键点像素的特征学习，输出一级特征图。The pre-processed image processed by the patch partitioning transform layer is divided into 4×4 window, and then individually /> Self-attention learning is performed within each window; a separate /> After learning the windows, each window is shifted to the four corners by two adjacent pixels to form an aggregation window on the top corner of each window. Aggregation self-attention learning is then performed separately inside the aggregation window to strengthen the feature learning of the pixels of the back feature key points that may be included in the initial feature map, and output a first-level feature map.

（3.2）二级特征图提取模块：将一级特征图输入第一个补丁合并层、第二个聚合式窗口自注意力学习层，输出二级特征图；(3.2) Secondary feature map extraction module: The primary feature map is input into the first patch merging layer and the second aggregated window self-attention learning layer, and the secondary feature map is output;

第一个补丁合并层包括：将一级特征图提取模块得到的一级特征图进行非卷积式下采样，将一级特征图大小变为；The first patch merging layer includes: performing non-convolutional downsampling on the first-level feature map obtained by the first-level feature map extraction module, and reducing the size of the first-level feature map to ;

第二个聚合式窗口自注意力学习层包括：将一级特征图以4×4大小划分成个窗口，然后单独对/>个窗口内部进行自注意力学习；单独的/>个窗口学习后将各个窗口向四个顶角偏移临近的2个像素，形成各窗口顶角上的聚合窗口，再单独对聚合窗口内部进行聚合自注意力学习，强化一级特征图中可能包含背部特征关键点像素的特征学习，输出二级特征图。The second aggregated window self-attention learning layer includes: dividing the first-level feature map into 4×4 window, and then individually /> Self-attention learning is performed within each window; a separate /> After learning the windows, each window is shifted to the four corners by two adjacent pixels to form an aggregation window on the top corner of each window. Aggregation self-attention learning is then performed separately inside the aggregation window to strengthen the feature learning of the pixels that may contain the back feature key points in the first-level feature map, and output the second-level feature map.

（3.3）三级特征图提取模块：将二级特征图输入第二个补丁合并层、第三个聚合式窗口自注意力学习层，输出三级特征图；(3.3) Three-level feature map extraction module: The second-level feature map is input into the second patch merging layer and the third aggregated window self-attention learning layer, and the third-level feature map is output;

第二个补丁合并层包括：将二级特征图提取模块得到的二级特征图进行非卷积式下采样，将二级特征图大小变为；The second patch merging layer includes: performing non-convolutional downsampling on the secondary feature map obtained by the secondary feature map extraction module, and reducing the size of the secondary feature map to ;

第三个聚合式窗口自注意力学习层：将二级特征图以4×4大小划分成个窗口，然后单独对/>个窗口内部进行自注意力学习；单独的/>个窗口学习后将各个窗口向四个顶角偏移临近的2个像素，形成各窗口顶角上的聚合窗口，再单独对聚合窗口内部进行聚合自注意力学习，强化二级特征图中可能包含背部特征关键点像素的特征学习，输出三级特征图。The third aggregated window self-attention learning layer: divide the secondary feature map into 4×4 window, and then individually /> Self-attention learning is performed within each window; a separate /> After learning the windows, each window is shifted to the four corners by two adjacent pixels to form an aggregate window on the top corner of each window. Aggregate self-attention learning is then performed separately inside the aggregate window to strengthen the feature learning of the pixels that may contain the back feature key points in the secondary feature map, and output the third-level feature map.

（3.4）四级特征图提取模块：将三级特征图输入第三个补丁合并层、第四个聚合式窗口自注意力学习层，输出四级特征图；(3.4) Four-level feature map extraction module: The three-level feature map is input into the third patch merging layer and the fourth aggregated window self-attention learning layer, and the four-level feature map is output;

第三个补丁合并层包括：将三级特征图提取模块得到的三级特征图进行非卷积式下采样，将三级特征图大小变为；The third patch merging layer includes: performing non-convolutional downsampling on the three-level feature map obtained by the three-level feature map extraction module, and converting the size of the three-level feature map into ;

第四个聚合式窗口自注意力学习层包括：将三级特征图以4×4大小划分成个窗口，然后单独对/>个窗口内部进行自注意力学习；单独的/>个窗口学习后将各个窗口向四个顶角偏移临近的2个像素，形成各窗口顶角上的聚合窗口，再单独对聚合窗口内部进行聚合自注意力学习，强化三级特征图中可能包含背部特征关键点像素的特征学习，输出四级特征图。The fourth aggregated window self-attention learning layer includes: dividing the three-level feature map into 4×4 window, and then individually /> Self-attention learning is performed within each window; a separate /> After learning the windows, each window is shifted to the four corners by two adjacent pixels to form an aggregation window on the top corner of each window. Aggregation self-attention learning is then performed separately inside the aggregation window to strengthen the feature learning of the pixels that may contain the back feature key points in the three-level feature map, and output a four-level feature map.

非卷积式下采样包括：Non-convolutional downsampling includes:

特征图输入到下一级特征提取模块时会首先进入下一级模块中的补丁合并层，补丁合并层将每个2×2的相邻像素区划分为一个补丁，将每个补丁中相同数值的像素组合在一起得到N个2×2的单通道特征图，接着将这4个单通道特征图在深度方向进行拼接，最后通过一个全连接层在单通道特征图的深度方向做线性变换，由此将单通道特征图的深度由C变成2C。When the feature map is input to the next-level feature extraction module, it will first enter the patch merging layer in the next-level module. The patch merging layer divides each 2×2 adjacent pixel area into a patch, and combines the pixels with the same value in each patch to obtain N 2×2 single-channel feature maps. Then, these 4 single-channel feature maps are spliced in the depth direction. Finally, a fully connected layer is used to perform a linear transformation in the depth direction of the single-channel feature map, thereby changing the depth of the single-channel feature map from C to 2C.

自注意力学习包括：将单通道特征图中的每个像素的信息所对应的向量作为输入向量，通过、/>、/>生成对应的/>、/>、/>，/>、/>、/>的向量长度与单通道特征图的深度/>保持一致，对应所有像素生成/>的过程如下式：Self-attention learning includes: taking the vector corresponding to the information of each pixel in the single-channel feature map as the input vector, 、/> 、/> Generate the corresponding /> 、/> 、/> ,/> 、/> 、/> The vector length and the depth of the single channel feature map/> Keep consistent, corresponding to all pixels generated /> The process is as follows:

； ;

对应所有像素生成的过程如下式：Generate corresponding to all pixels The process is as follows:

； ;

其中，为将长度为/>、宽度为/>、深度为/>的单通道特征图中的所有像素拼接在一起得到的矩阵，/>、/>、/>为生成对应的/>、/>、/>所需要的权重矩阵，、/>、/>为输入向量分别经/>、/>、/>变换后得到的信息矩阵。in, To make the length of/> , width is/> , depth is/> The matrix obtained by splicing all pixels in the single channel feature map together, /> 、/> 、/> To generate the corresponding /> 、/> 、/> The required weight matrix is 、/> 、/> The input vectors are respectively 、/> 、/> The information matrix obtained after transformation.

将所得、/>、/>信息矩阵输入聚合自注意力学习函数：The income 、/> 、/> Information matrix input aggregation self-attention learning function:

； ;

其中，D为向量维度，B为向量间的聚合绝对偏移量；Where D is the vector dimension, and B is the aggregate absolute offset between vectors;

。 .

聚合自注意力学习：随机取一个4×4窗口，设左上角聚合窗口的第行、第/>列聚合窗口像素的特征矩阵为/>，则向窗口内部聚合的像素的特征矩阵为/>与/>，聚合自注意力学习过程如下：Aggregate self-attention learning: Randomly take a 4×4 window and set the top left aggregation window Row, No./> The feature matrix of the column aggregation window pixels is/> , then the feature matrix of the pixels aggregated inside the window is/> With/> , the aggregated self-attention learning process is as follows:

； ;

其中，为/>中的背部特征量，/>为/>中的背部特征量，默认/>的背部特征量为1。in, For/> The back feature quantity in ,/> For/> The back feature value in the default /> The back feature value is 1.

将上式通过函数将进行归一化后得到聚合窗口的特征矩阵为；重复以上过程，得到同一4×4窗口中剩下三个聚合窗口像素的特征矩阵。当/>时，则确定该聚合窗口只存在背部区域特征，不存在背景区域特征，则筛除该聚合窗口；当满足/>中的背部特征量由/>增长为/>，且为四个聚合窗口特征量中的唯一最大值时，则确定该聚合窗口是四个聚合窗口中最有可能存在背部特征关键点的区域；最后，根据得到的最有可能存在背部特征关键点的聚合窗口特征信息得到背部特征关键点的实际位置。Pass the above formula through The function will be normalized to obtain the characteristic matrix of the aggregation window: ; Repeat the above process to obtain the feature matrix of the remaining three aggregated window pixels in the same 4×4 window. When/> When , it is determined that the aggregation window only has back region features and no background region features, and the aggregation window is screened out; when it satisfies/> The back feature quantity in is given by/> Growth to/> , and it is the only maximum value among the feature quantities of the four aggregation windows, then it is determined that the aggregation window is the area in the four aggregation windows that is most likely to have the back feature key points; finally, the actual position of the back feature key points is obtained based on the obtained aggregation window feature information that is most likely to have the back feature key points.

在本实施例中，由于首先要检测的人体背部特征关键点位于人体背部图像与背景图像的交界处，即使获得深度补偿图也会一定程度上受到交界处背景图像像素的干扰，从而导致背部特征关键点检测结果出现偏差，进而使整个背部穴位定位的精度受到影响；因此采用四个聚合式窗口自注意力学习层，不断迭代，强化各级特征图中可能包含背部特征关键点像素的特征学习。In this embodiment, since the key feature points of the human back to be detected first are located at the junction of the human back image and the background image, even if the depth compensation map is obtained, it will be interfered by the background image pixels at the junction to a certain extent, resulting in deviations in the detection results of the back feature key points, and thus affecting the accuracy of the entire back acupuncture point positioning; therefore, four aggregated window self-attention learning layers are used to continuously iterate and strengthen the feature learning of the back feature key point pixels that may be contained in the feature maps at all levels.

如图5所示，为未经过聚合式窗口自注意力学习得到的人体背部特征关键点检测结果，可以明显看出背部左上角与右上角的关键点偏向背部内侧，导致不能精准定位如乘风穴等靠近背部边缘的穴位的位置；而背部下方的两个关键点由于交界处背景图像像素的干扰而没有检测出来；如图6所示，为经过聚合式窗口自注意力学习得到的人体背部特征关键点检测结果，可以明显看出背部四个特征关键点紧靠背部边缘，并且四个点构成的四边形包含了背部需要检测的所有27个穴位，保证27个穴位能够通过背部穴位定位公式被正确精准的定位。As shown in Figure 5, this is the result of the detection of the key feature points of the human back without the aggregated window self-attention learning. It can be clearly seen that the key points in the upper left corner and the upper right corner of the back are biased towards the inner side of the back, which makes it impossible to accurately locate the positions of the acupoints close to the edge of the back, such as the Chengfeng acupoint; and the two key points at the bottom of the back are not detected due to the interference of the background image pixels at the junction; as shown in Figure 6, this is the result of the detection of the key feature points of the human back after the aggregated window self-attention learning. It can be clearly seen that the four key feature points of the back are close to the edge of the back, and the quadrilateral formed by the four points includes all 27 acupoints that need to be detected on the back, ensuring that the 27 acupoints can be correctly and accurately located through the back acupoint positioning formula.

（3）将四级特征图输入背部特征关键点检测网络对预处理后的背部图像进行背部特征关键点检测，并通过背部穴位定位公式得到背部穴位具体坐标。背部特征关键点检测网络包括：(3) Input the four-level feature map into the back feature key point detection network to detect the back feature key points of the preprocessed back image, and obtain the specific coordinates of the back acupoints through the back acupoint positioning formula. The back feature key point detection network includes:

全局池化层：将获得的四级特征图进行全局平均池化，四级特征图被压缩成固定长度的特征向量，此特征向量是预处理后的图像中的背部特征关键点的特征表示；Global pooling layer: The obtained four-level feature map is globally averaged and pooled. The four-level feature map is compressed into a feature vector of fixed length. This feature vector is the feature representation of the back feature key points in the preprocessed image.

分类器：将全局平均池化后得到的特征向量输入到全连接层，得到图像背部特征关键点的预测标签及坐标，并在原始背部RGB图像中进行标注。Classifier: The feature vector obtained after global average pooling is input into the fully connected layer to obtain the predicted labels and coordinates of the feature key points on the back of the image, and annotate them in the original back RGB image.

背部穴位定位公式包括：The back acupuncture point positioning formula includes:

默认设置为人体背部左上关键点坐标， />为人体背部右上关键点坐标，/>为人体背部左下关键点坐标，/>为人体背部右下关键点坐标，深度双目相机光轴与原始背部RGB图像交点为坐标轴相对原点；根据传统中医骨度分寸定位法，以背部脊骨骨节为标志，将骨节两端之间的长度折量为一定的分寸，转化分寸为数学距离，一寸约为3.3厘米，用以确定背部穴位的坐标；由于人体背部穴位数量较多，且穴位关于背部脊骨左右对称，所以这里选取了背部左半侧以及背部中线上的27个穴位进行定位：default setting is the coordinate of the key point on the upper left of the back of the human body, /> is the coordinates of the key point on the upper right side of the human back, /> is the coordinate of the key point on the lower left side of the human back, /> is the coordinate of the key point on the lower right side of the back of the human body. The intersection of the optical axis of the deep binocular camera and the original back RGB image is the relative origin of the coordinate axis. According to the traditional Chinese medicine bone degree positioning method, the length between the two ends of the joint is measured as a certain inch, and the inch is converted into a mathematical distance. One inch is about 3.3 cm, which is used to determine the coordinates of the back acupuncture points. Since there are many acupuncture points on the back of the human body, and the acupuncture points are symmetrical about the back spine, 27 acupuncture points on the left half of the back and the midline of the back are selected for positioning:

(1)乘风：，曲恒：/>，天宗：/>；(1) Riding the Wind: , Qu Heng:/> , Tianzong:/> ;

(2)附分：，魄户：/>，膏盲：/>，神堂：/>，譩嘻：，膈关：/>，魂门：，阳纲：/>，意舍：；风门：/>，肺俞：，阙阴俞：/>，心俞：，督俞：/>，膈俞：，胃脘下俞：/>，肝俞：，胆俞：/>，脾俞：；(2) Attached points: , Pohu:/> , paste blind:/> , Temple:/> , joke: , diaphragm:/> , Soul Gate: , Yang Gang:/> , intention: ; Air door: /> , Lung Shu: , Queyinshu:/> , Xinyu: , Du Yu:/> , diaphragm: , Wei Wan Xia Shu:/> , Liver Shu: , Gallbladder:/> , Spleen Shu: ;

(3)身柱：，神道：/>，灵台：，至阳：/>，筋缩：/>；(3) Body column: , Shinto:/> , Lingtai: , to Yang:/> , muscle contraction:/> ;

以下是上述穴位转化为坐标的具体说明：The following is a detailed description of the conversion of the above acupoints into coordinates:

(1)乘风、曲恒、天宗位于肩部；(1) Chengfeng, Quheng and Tianzong are located in the shoulders;

(2)附分、魄户、膏盲、神堂、譩嘻、膈关位于第2-5胸椎棘突下，旁开3寸；魂门、阳纲、意舍位于从第2腰椎至第5骶椎，各棘突下旁开3寸；风门、肺俞、阙阴俞、心俞、督俞、膈俞、胃脘下俞位于第2-5胸椎棘突下，旁开1.5寸；肝俞、胆俞、脾俞位于从第2腰椎至第5骶椎，各棘突下旁开1.5寸；(2) Fufen, Pohu, Gaomiang, Shentang, Xuxi, Geguan are located 3 inches below the spinous processes of the 2nd to 5th thoracic vertebrae; Hunmen, Yanggang, Yishe are located from the 2nd lumbar vertebrae to the 5th sacral vertebrae, 3 inches below each spinous process; Fengmen, Feishu, Queyinshu, Xinshu, Dushu, Geshu, Weiwanxiashu are located 1.5 inches below the spinous processes of the 2nd to 5th thoracic vertebrae; Ganshu, Danshu, Pishu are located from the 2nd lumbar vertebrae to the 5th sacral vertebrae, 1.5 inches below each spinous process;

(3)身柱、神道、灵台、至阳、筋缩位于背部正中线。(3) Body pillar, Shendao, Lingtai, Zhiyang, and Jinxu are located on the midline of the back.

机器人处理器控制机器人的末端执行器到达背部穴位进行按揉式按摩。The robot processor controls the robot's end effector to reach the acupuncture points on the back for kneading massage.

Claims

1. An acupoint positioning method based on an aggregated window self-attention mechanism, characterized by comprising the following steps:

(1) Acquire a human back image and perform preprocessing to obtain a preprocessed back image;

(2) inputting the preprocessed back image into a multi-scale feature extraction network to obtain a back feature map; the multi-scale feature extraction network includes a plurality of feature map extraction modules, the preprocessed back image is sequentially processed by the plurality of feature map extraction modules, each feature map extraction module includes an aggregated window self-attention learning layer, and the aggregated window self-attention learning layer is used to strengthen feature learning of pixels containing feature key points in the extracted feature map;

(3) Inputting the back feature map into the back feature key point detection network to detect the back feature key points and obtain the back feature key points;

(4) Based on the key points of the back features, the specific coordinates of the back acupoints are obtained through the back acupoint positioning formula to achieve back acupoint positioning.

2. The acupoint locating method according to claim 1, characterized in that preprocessing the human back image comprises:

Obtaining depth information of the human back image;

The depth information is fused with the human back image through the depth information fusion formula to obtain a visual depth information map;

The binary image of the human back is used as a mask to compensate for the lack of visual depth information map and obtain the preprocessed back image.

3. The acupoint locating method according to claim 2, characterized in that the step of obtaining the depth information of the human back image specifically comprises: obtaining a disparity map of the human back image, converting the disparity map into a depth map by a depth conversion formula to obtain the depth information, and the depth information calculation formula is:

;

in, is the depth information, /> is parallax, /> is the baseline length, /> is the focal length, /> 、/> They are the horizontal coordinates of the same positions in the disparity map corresponding to the left and right cameras of the depth binocular camera.

4. The acupoint positioning method according to claim 3, characterized in that the depth information fusion formula is:

;

in, is the depth information of the i-th pixel in the human back image, /> is the horizontal coordinate of the ith pixel in the human back image, /> is the ordinate of the i-th pixel in the human back image, /> is the horizontal coordinate of the intersection of the camera optical axis and the imaging plane, /> is the ordinate of the intersection of the camera optical axis and the imaging plane, /> 、/> 、/> is the fusion point coordinate of the i-th pixel in the visual depth information map, /> 、/> They are respectively/> Axis direction and /> The length of a single pixel in the axis direction.

5. The acupoint locating method according to claim 2, characterized in that the preprocessed back image is a human back image. Binarized image of and visual depth information map/> Use binary degree number/> The random fusion is obtained, and the fusion formula is:

;

in, is the regression reward model,/> is an infinite non-repeating decimal, /> For the degree of reward.

6. The acupoint locating method according to claim 1 is characterized in that the multi-scale feature extraction network comprises a primary feature map extraction module, a secondary feature map extraction module, a tertiary feature map extraction module, and a quaternary feature map extraction module; the preprocessed back image is input into the primary feature map extraction module, and the primary feature map is output; the primary feature map extraction module comprises a patch partitioning transformation layer and a first aggregated window self-attention learning layer; the patch partitioning transformation layer comprises patch cropping and depth compression of the preprocessed back image;

The primary feature map is input into a secondary feature map extraction module, and a secondary feature map is output, wherein the secondary feature map extraction module includes a first patch merging layer and a second aggregated window self-attention learning layer;

The secondary feature map is input into a third-level feature map extraction module, and a third-level feature map is output. The third-level feature map extraction module includes a second patch merging layer and a third aggregated window self-attention learning layer;

The three-level feature map is input into a four-level feature map extraction module, and a four-level feature map, i.e., a back feature map, is output. The four-level feature map extraction module includes a third patch merging layer and a fourth aggregated window self-attention learning layer; the first patch merging layer, the second patch merging layer, and the third patch merging layer are used to perform non-convolutional downsampling on the feature map.

7. The acupoint locating method according to claim 6, characterized in that the first aggregated window self-attention learning layer, the second aggregated window self-attention learning layer, the third aggregated window self-attention learning layer, and the fourth aggregated window self-attention learning layer all comprise the following steps:

Divide the input feature map into windows to obtain several windows;

Perform self-attention learning on each window separately;

Shift each window after self-attention learning to the four corners to form an aggregate window on the corners of each window;

Aggregate self-attention learning is performed inside the aggregation window, and the enhanced feature map is output.

8. The acupoint locating method according to claim 7, characterized in that the self-attention learning is to use the vector corresponding to the information of each pixel in the single-channel feature map as the input vector, generate the corresponding information matrix through the weight matrix, and input the information matrix into the self-attention learning function;

The formula for the aggregated self-attention learning process is:

;

in, The top left corner of the aggregation window/> Row, No./> The feature matrix of the column-aggregated window pixels; /> With/> is the feature matrix of pixels aggregated inside the window; /> is the feature matrix/> The back feature quantity in ,/> is the feature matrix/> The back feature quantity in ;

When it appears When , the aggregation window only has back region features and no background region features, then the aggregation window is screened out;

When the characteristic matrix The back feature quantity of is given by/> Growth to/> And when it is the only maximum value among the feature quantities of the four aggregation windows, it is determined that the aggregation window is the area in the four aggregation windows that is most likely to have back feature key points.

9. The acupoint locating method according to claim 8 is characterized in that the back feature key point detection network includes a global pooling layer and a classifier; the global pooling layer performs global average pooling on the back feature map, and the back feature map is compressed into a feature vector of a fixed length. The classifier inputs the feature vector into a fully connected layer to obtain the predicted labels and coordinates of the back feature key points.

10. A robot using the acupoint locating method according to claim 1, characterized in that it comprises an image acquisition module, a processing module, a control module, and an end effector;

The image acquisition module is used to acquire the image of the back of the human body;

The processing module is used to preprocess the acquired human back image to obtain a preprocessed back image; input the preprocessed back image into a multi-scale feature extraction network to obtain a back feature map; input the back feature map into a back feature key point detection network to perform back feature key point detection to obtain back feature key points; according to the back feature key points, the specific coordinates of the back acupoints are obtained by the back acupoint positioning formula to realize back acupoint positioning;

The control module is used to control the end effector to perform kneading massage on the located acupuncture points on the back.