GB2628958A

GB2628958A - A method of infrastructure-augmented cooperative perception for autonomous vehicles based on voxel feature aggregation

Info

Publication number: GB2628958A
Application number: GB2412153.5A
Authority: GB
Inventors: Zhao Cong; Du Yuchuan; Zhu Yifan; Ji Yuxiong
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-03-01
Filing date: 2021-04-01
Publication date: 2024-10-09
Anticipated expiration: 2041-04-01
Also published as: GB2619196B8; GB2619196B; GB2628958B; GB2619196A; CN115943439A; GB202316614D0; GB202313217D0; GB2621048A; GB2629743A; CN115605777A

Abstract

The present invention discloses a method of infrastructure-augmented cooperative perception for autonomous vehicles based on voxel feature aggregation. The method includes the following steps: deploying a roadside LiDAR, configuring the corresponding roadside computing device for said roadside LiDAR; calibrating the extrinsic parameters of said roadside LiDAR; calculating the relative pose of an autonomous driving vehicle concerning said roadside LiDAR based on positioning data and extrinsic parameters; transforming the point clouds detected by said roadside LiDAR into the coordinate system of said autonomous driving vehicle using its relative pose; voxelizing the transformed point clouds to obtain voxelized transformed point clouds. Voxelization processing is also applied to point clouds detected by the onboard LiDAR to obtain voxelized onboard LiDAR point clouds. Said roadside computing device calculates voxel-level features of voxelized transformed point clouds through a feature extraction network, while said autonomous driving vehicle calculates voxellevel features of said onboard LiDAR point cloud. The computed features are compressed and transmitted to a computing device, which can be either an autonomous driving vehicle, a roadside computing device, or a cloud server. Said computing device aggregates these features and inputs them into a deep neural network-based 3D object detection model based on voxels to obtain object detection results. When the said computing device is roadside or cloud-based, the said computing device sends back object detection results to said autonomous driving vehicle

Description

A METHOD OF INFRASTRUCTURE-AUGMENTED COOPERATIVE PERCEPTION FOR AUTONOMOUS VEHICLES BASED ON VOXEL FEATURE AGGREGATION

Technical Field

The present invention belongs to the field of autonomous driving vehicle-road coordination technology and relates to a method of infrastructure-augmented cooperative perception for autonomous vehicles based on voxel feature aggregation.

Background Technology

In the 21st century, with the continuous development of urban roads and the automotive industry, cars have become an essential mode of transportation for people, greatly facilitating daily life and productivity. However, excessive use of cars has also brought about environmental pollution, traffic congestion, traffic accidents, and other problems. To alleviate the problem of excessive car use and free people from the transportation system while improving vehicle driving ability, autonomous driving vehicles have gradually become an important direction for future car development. With the rise of deep learning technology and the increasing attention to artificial intelligence, autonomous 00 driving, as a prominent focal point in Al, has also gained tremendous popularity.

Autonomous driving is a comprehensive system of software and hardware interaction. The core O technologies of autonomous driving include hardware (automobile manufacturing technology, autonomous driving chips), autonomous driving software, high-precision maps, sensor Ncommunication networks, etc. From the perspective of software, it can be divided into three modules: environmental perception, behavioral decision-making, and motion control.

Perception is the first step in autonomous driving and serves as the link between vehicles and their environment. The overall performance of an autonomous driving system depends largely on the quality of its perception system. Perception in autonomous driving vehicles is achieved through sensors, with LiDAR using lasers for detection and measurement. Its principle involves emitting pulsed lasers around the vehicle which reflect when they encounter objects, calculating distance based on time difference to establish a 3D model of surrounding environments. LiDAR has high precision and long-range capabilities due to its short wavelength, allowing it to detect even small targets from far away distances. Point cloud data obtained by LiDAR contains large amounts of information with higher accuracy, making it ideal for target detection and classification within autonomous driving perception systems. On one hand, LiDAR overturns traditional 2D projection imaging modes by collecting depth information about target surfaces to obtain more complete spatial information about targets: reconstructed 3D models can better reflect geometric shapes while also providing rich feature information such as surface reflection characteristics or motion speed that support target detection, recognition, tracking, etc., reducing algorithmic complexity. On the other hand, active laser technology provides high measurement resolution along with strong anti-interference ability against stealthy targets while working under all weather conditions.

Currently, based on the presence or absence of mechanical components, LiDAR can be divided into mechanical LiDAR and solid-state LiDAR. Although solid-state LiDAR is considered to be the future trend, in the current battlefield of LiDAR, mechanical LiDAR still occupies the mainstream position. Mechanical LiDARs have rotating parts that control the angle of laser emission, while solid-state lasers do not require mechanical rotating parts and mainly rely on electronic components to control the angle of laser emission.

In existing autonomous driving solutions, LiDAR is the most important sensor in its environmental perception module, responsible for real-time map building and positioning, target detection, and other tasks in environmental perception. For example, Google Waymo has added five LiDARs to its sensor configuration plan. Four side-facing LiDARs are distributed around the front and rear of the vehicle as medium-to-short-range multi-line radar to supplement blind spot vision; a high-linecount LiDAR is installed on top for large-scale perception, with blind spots supplemented by four side-facing LiDARs.

The scanning data of the LiDAR sensor is recorded in the form of point clouds. Point cloud data refers to a collection of vectors in a three-dimensional coordinate system. These vectors are usually represented in the form of X, Y, and Z coordinates. In addition to containing three-dimensional coordinates, each point may also contain color information (RGB) or reflection intensity information.

Among them, the X, Y, and Z columns represent the three-dimensional position of point data in either the sensor coordinate system or the world coordinate system, generally measured in meters. The Intensity column represents the laser reflection intensity at each point, with values normalized between 0 and 255 and without a specific unit.

Due to the height limitation of the installation of the onboard LiDAR, which is mainly determined by the size of the vehicle and is usually only about two meters high, its ability to detect information CO can be easily affected by obstacles around the vehicle. For example, a cargo truck driving in front of a small car can almost completely block the forward vision of LiDAR on the small car, severely reducing O its environmental perception capability. In addition, the performance of LiDAR itself may also be C\J limited by overall vehicle cost considerations: therefore, expensive high-line-count LiDARs are often not installed in vehicles. As a result, point cloud data obtained from onboard LiDARs often have blind spots or sparse areas. It is difficult for automatic driving perception tasks to rely solely on sensors installed in vehicles. Compared with onboard LiDARs, roadside LiDARs have more transparent views that are less likely to be blocked because they can be deployed at higher gantries or lamp posts. In addition, roadside LiDARs have a higher tolerance for costs and can use higher line-count lasers while configuring higher computing power units for faster detection speed and better detection performance.

At present, the vehicle-road coordination system is in a wave of research and testing. The intelligent vehicle-road coordination solution based on V2X technology can enhance the current achievable advanced driving assistance functions, improve vehicle driving safety and road operation efficiency, and provide data services and technical support for autonomous driving in the future.

The existing LiDAR vehicle-road collaborative solution involves each vehicle and roadside facility detecting targets based on their own LiDAR point cloud data, and then the facility sends the detection results to the vehicle. Most scholars focus on analyzing the reliability of data transmission, calculating relative pose between vehicles and roadsides, or processing data transmission delays between vehicles and roadsides. They all assume that target detection results are directly sent during the collaborative process. Although this approach has a low amount of data transmission, it still cannot fully utilize the detection data from both ends. For example, when neither end detects complete target point clouds, missed detections or false alarms can easily occur leading to errors in collaborative target detection results. To address this issue, some scholars propose sending original point cloud data to prevent information loss. For instance, the Cooper framework proposed in 2019 first introduced a cooperative perception scheme at the raw point cloud level by fusing different sources of point cloud data significantly improving perception performance.

However, at the same time, the size of a single frame of LiDAR point cloud data is often over ten or even dozens of megabytes. The existing vehicle-road cooperative communication conditions are difficult to support such a large amount of real-time point cloud data transmission. Therefore, autonomous driving technology urgently needs a better collaborative detection method that utilizes LiDAR data on both ends, which not only meets the requirements for target detection accuracy but also minimizes the amount of data transmission.

Existing target recognition and classification algorithms based on LiDAR point cloud data are all based on deep neural network technology.

Existing Technology Patent document US9562971B2 Patent document US20150187216A1 Patent document CN110989620A Patent document CN110781927A Patent document CN111222441A Patent document CN108010360A CO Invention Content To solve the above problems, the present invention provides a vehicle-road cooperative O perception method for 3D object detection based on deep neural networks with feature sharing and C\I provides a LiDAR point cloud data-based vehicle-road collaboration scheme that balances transmission data size and information loss degree. It is used to solve the problem of insufficient single-vehicle perception ability of current autonomous driving vehicles while the bandwidth of vehicle-road collaborative communication is insufficient.

The specific technical problems to be solved include determining the layout plan of roadside LiDAR, selecting an extrinsic parameter calibration method for roadside LiDAR, calculating extrinsic parameters based on the relative pose between autonomous driving vehicles and roadside LiDAR, and determining a suitable information representation form for vehicle-road collaboration.

The goal of this invention is: to ensure reduced information transmission volume under the premise of collaborative perception capability between vehicles and roads.

The patent solution to the technical problems in this invention is divided into the preparation stage and the application stage. The steps in the preparation stage are as follows: A. Install a roadside LiDAR and configure the corresponding roadside computing device; B. Calibrate the extrinsic parameters for said roadside LiDAR. The steps in the application stage are as follows: C. Said roadside computing device calculates the relative pose between an autonomous driving vehicle and said roadside LiDAR based on positioning data from said autonomous driving vehicle and said extrinsic parameters of said roadside LiDAR; D. Said roadside computing device transforms roadside point clouds from the roadside LiDAR coordinate system into the coordinate system of said autonomous driving vehicles according to the relative poses obtained in step C, obtaining transformed point clouds; E. Said roadside computing device voxelizes said transformed point clouds, obtaining voxelized transformed point clouds; Said autonomous driving vehicle voxelizes the onboard [DAR point cloud, obtaining voxelized onboard point cloud; F. Said roadside computing devices calculate voxel-level features of transformed point clouds through a feature extraction network, obtaining voxel-level features of said voxelized transformed point clouds; Said autonomous driving vehicle calculates voxel-level features of voxelized onboard LiDAR point cloud through said deep feature extraction network, obtaining voxel-level features of onboard LiDAR point cloud; The follow-up steps are divided into three sub-plans: I, II, and Ill. Sub-plan I completes steps G1, H1, and 11 on the roadside computing device; sub-plan II completes steps G2, H2, and 12 on the autonomous driving vehicle; sub-plan III completes steps G3, H3, and 13 in the cloud.

In sub-plan I: G1. Said autonomous driving vehicle compresses said voxel-level features of the onboard LiDAR point cloud to obtain compressed voxel-level features of the onboard LiDAR point cloud and transmits them to said roadside computing device. Said roadside computing device receives said compressed voxel-level features of the onboard LiDAR point cloud and restores them to voxel-level features of the onboard LiDAR point cloud: H1. Said roadside computing device performs data stitching and data aggregation on voxel-level features of both onboard LiDAR point clouds and transformed point clouds to obtain aggregated voxel-level features; CO 11. Said roadside computing device inputs aggregated voxel-level features into an object detection network model based on voxel-level features to obtain object detection results which are O then transmitted back to said autonomous driving vehicle.

C\J In sub-plan II: G2. Said roadside computing device compresses voxel-level features of transformed point clouds, obtains compressed voxel-level features of transformed point clouds, and transmits it to said autonomous driving vehicle; Said autonomous driving vehicle receives compressed voxel-level features of transformed point clouds, restore it as voxel-level features of transformed point clouds; H2. Said autonomous driving vehicle performs data stitching and data aggregation on both onboard LiDAR point cloud's voxel-level feature and transformed point cloud's voxel-level feature; 12. Said autonomous driving vehicle inputs aggregated volumetric features into a 3D object detection network model based on voxel-level features to obtain object detection results.

In sub-plan III: G3. Said autonomous driving vehicle compresses the voxel-level features of the onboard LiDAR point cloud, obtains compressed voxel-level features of the onboard LiDAR point cloud, and transmits them to a cloud server. Said roadside computing device compresses the voxel-level features of transformed point clouds, obtains compressed voxel-level features of transformed point clouds, and transmits them to said cloud server. Said cloud server receives both compressed voxel-level features of transformed point clouds and compressed voxel-level features of the onboard LiDAR point clouds, restores them respectively: H3. Said cloud server performs data stitching and data aggregation on both onboard LiDAR point cloud's voxel-level feature and transformed point cloud's voxel-level feature; 13. Said cloud server inputs aggregated volumetric features into a 3D object detection network model based on voxel-level features to obtain object detection results which are then transmitted back to autonomous driving vehicles.

The specific technical solutions for the above steps in this invention patent are as follows: A. Deployment of LiDAR The deployment of the roadside LiDAR is determined based on the existing roadside pillar facilities and the type of installed LiDAR in the vehicle-road cooperative scene. The existing roadside LiDARs are installed in a pole or crossbar manner, with specific installation locations being infrastructure pillars such as roadside gantries, streetlights, and signal light poles that have electrical support.

According to whether there are internal rotating components, LiDARs can be divided into mechanical rotary-type LiDARs, hybrid-type LiDARs, and solid-state LiDARs. Among them, mechanical rotary-type and solid-state LiDARs are commonly used types of roadside LiDARs.

For intersection scenes and other scenarios, deploying a roadside LiDAR with a detection range greater than or equal to the scene range or containing key areas within the scene is sufficient. For long-distance large-scale complex scenes such as expressways, highways, and parks, it is recommended to follow the guidelines for deploying roadside LiDARs so that their coverage area meets full coverage requirements for scenes; i.e., a single roadside LiDAR supplements detection blind spots under other roadside LiDAR detections within its coverage area to achieve better vehicle-road cooperative target detection results.

Guidelines for deploying side-mounted mechanical rotary-type LiDARs differ from those for solid-state side-mounted LiDARs.

CO Ai) Roadside Mechanical Rotating LiDAR and Roadside Hybrid Solid-State LiDAR Deployment Scheme O The mechanical rotating LiDAR achieves laser scanning through mechanical rotation. The laser C\I emitting component is arranged in a vertical direction as a line array of laser sources, which can produce beams pointing at different angles within the vertical plane through lenses. Under the drive of an electric motor, continuous rotation changes the beam from "line" to "plane", forming multiple laser "planes" through rotational scanning and achieving detection in the detection area. The hybrid solid-state LiDAR refers to using semiconductor "micro-movement" devices (such as MEMS scanning mirrors) instead of macroscopic mechanical scanners to achieve laser scanning at a microscopic scale on the radar emission end.

The deployment guidelines for roadside mechanical rotating LiDAR and roadside hybrid solid-state LiDAR require that they be horizontally installed when installing them on roadsides, ensuring full utilization of beam information in all directions. As shown in Figure 2, deploying roadside mechanical rotating LiDAR and roadside hybrid solid-state LiDAR should meet at least the following requirements: H " > f, (1) tan((9°2) Where: H1 represents the installation height of the roadside mechanical rotating LiDAR or roadside hybrid solid-state LiDAR; 0,, represents the angle between the highest elevation angle beam of the roadside mechanical rotating LiDAR or roadside hybrid solid-state LiDAR and the horizontal direction; La represents the distance between adjacent mounting poles of roadside mechanical rotating LiDAR or roadside hybrid solid-state LiDAR.

A2) Roadside Solid-State LiDAR Deployment Scheme The solid-state LiDAR eliminates the mechanical scanning structure, and the laser scanning in both horizontal and vertical directions is achieved electronically. The phased array laser transmitter consists of a rectangular array of multiple transmitting and receiving units. By changing the phase difference of light rays emitted from different units in the array, it is possible to adjust the angle and direction of the emitted laser. After passing through an optical beam splitter, the laser source enters an optical waveguide array where external control changes the phase of light waves on each waveguide to achieve beam scanning using phase differences between waveguides.

As shown in Figure 3, for the roadside deployment of solid-state LiDARs, guidelines require that they meet at least the following requirements: Hn tanell -- (2) tan(9) 2 Where: H1, represents the installation height of the roadside solid-state LiDAR; 06 represents the vertical field of view angle of the roadside solid-state LiDAR; 8 represents the angle between the highest elevation beam of the roadside solid-state LiDAR and the horizontal direction; Lb represents the distance between adjacent installation poles for roadside solid-state LiDARs.

O For scenes where solid-state LiDARs are installed, a method of installing two reverse solid-state C\I LiDARs on one pole can also be used to compensate for blind spots in roadside perception and reduce the demand for roadside poles. In this case, requirements as shown in Figure 4 should be met.

He >L (3) tan(8) Where: H represents the installation height of the roadside solid-state LiDAR; 612 represents the angle between the highest elevation angle beam of the roadside solid-state LiDAR and the horizontal direction; represents the distance between adjacent installation poles of roadside solid-state LiDARs; For LiDAR vehicle-road coordination scenarios that meet these conditions, mechanical rotating or full solid state LiDARs should be deployed on roadsides according to the above requirements, and increase their scanning areas when conditions permit. For LiDAR vehicle-road coordination scenarios that cannot meet these conditions, new poles can be installed and the number of roadside LiDARs can be increased to meet the deployment guidelines for roadside LiDARs.

B. Extrinsic Parameter Calibration To calculate the relative position and orientation between the roadside LiDAR and the onboard LiDAR, it is necessary to calibrate the installation position and angle of the roadside LiDAR, which is called extrinsic parameter calibration. This process obtains coordinate position parameters and angular pose parameters of the LiDAR relative to a certain reference coordinate system. The extrinsic parameters of the LiDAR can be represented by the following vectors: ro=ko yo zo ao flo To] (4) Where: 1.7 represents the coordinates of the roadside LiDAR in the reference coordinate system; Y, represents the coordinates of the roadside LiDAR in the reference coordinate system; Zo represents the coordinates of the roadside LiDAR in the reference coordinate system; at, represents the rotation angle around the axis of roadside LiDAR in the reference coordinate system; flo represents the rotation angle around the axis of roadside LiDAR in the reference coordinate system; represents the rotation angle around the axis of roadside LiDAR in the reference coordinate system; The above benchmark coordinate system can be represented by longitude and latitude coordinate systems such as GCJO2 and WGS84, or it can be based on a specific geographic point in the geodetic coordinate system, such as the Beijing 54 coordinate system and the Xi'an 80 coordinate system. Correspondingly, the actual coordinates of a point in the benchmark coordinate system are related to the coordinates in the roadside LiDAR coordinate system obtained after being detected by the aforementioned LiDAR.

*ct C\I xItclar Y lidar = Rx(a (flo)Re(yo) areal x0 Yo (5) CO O Zzrd _ 1 0 0 y",, _z,"", O Rx(a0)= 0 cos ao sin a, (6) C\1 0 sin a" cos a" cos /I, 0 sin II, Ry)= 0 1 0 (7) sin/3" 0 cos flo cosy, sin yo 0 R,(7.)= sin yo cos yo 0 (8) 0 0 1 Where: i hdars the yi"1"r is the Mar is the X red is y",, is zred X coordinate of the point in the LiDAR coordinate system on the roadside; Y coordinate of the point in the LiDAR coordinate system on the roadside; Z coordinate of the point in the LiDAR coordinate system on the roadside; X -coordinate of this point in reference coordinates; Y -coordinate of this point in reference coordinates; Z -coordinate of this point in reference coordinates; Rja"), RV(/3o), and Rs(yo) are sub-rotation matrices calculated based on the three extrinsic parameters a0, fl", and 70 from three different angles.

The specific values of the extrinsic parameters of the roadside LiDAR are calculated by measuring the coordinates of control points in both the coordinate system of the roadside LiDAR and a reference coordinate system. The steps are as follows: (1) Select at least 4 reflectivity feature points within the detection range of the roadside LiDAR as control points. Reflectivity feature points refer to those with significant differences in reflectivity compared to surrounding objects, such as traffic signs and license plates. The purpose of selecting these points is to quickly find corresponding points between point cloud data and a coordinate in a reference coordinate system based on their position and reflection intensity difference from other points, thus establishing correspondences between multiple pairs of point clouds and coordinates in a reference coordinate system. Control points should be distributed discretely, with no three control points being collinear. More control points lead to better calibration results under conditions that allow for it.

(2 Use high-precision measurement instruments such as handheld RTK devices to measure precise coordinates of control points, then find corresponding point coordinates in the point cloud data from roadside LiDAR; if there is already an accurate map file available for this scene created by using high-precision surveying equipment or other means, direct matching can be performed without using handheld RTK devices.

(3) Use 3D registration algorithms (such as ICP algorithm or NDT algorithm) to calculate optimal values for extrinsic parameter vectors for LiDAR calibration, taking its result as calibration result. Among them, the ICP algorithm is mainly used when solving problems related to CO calibrating extrinsic parameters for LiDARs because it calculates optimal matches between target sets (the set consisting all positions where each selected control point appears in roadside LiDAR's local frame) and source sets (the set consisting all positions where each selected O control point appears relative to some global frame). The error function minimized during C\J the optimization process is defined as follows: E(R,T)=1I: 11 q,-(Rpr+T) 2 (9) R=Rx(a0)Ry(60)R,(710) (10) T=[.x" yo zUf (1l) Where: E(12, is the target error function; R is the rotation transformation matrix; T is the translation transformation matrix; n is the number of nearest point pairs in the point set; p, represents the coordinates of a point i in the target point set P; q, represents the point in the source point set Q that forms the nearest neighbor pair with a point p,.

C. Relative Pose Calculation Based on the positioning data of the autonomous driving vehicle and the extrinsic calibration results of the roadside LiDAR obtained in previous preparation work, determine the relative pose between the autonomous driving vehicle and roadside LiDAR. The relative pose is calculated according to the following formula: -x- = 12,(020)R, [air Vag, vriT X1 y1 xo O (12) Viryz = yriT Y1 (80)R,(7 0) ao Yo 7 r ir (13) Vai -[ar Vi= X, 11 A 20 (14) z1 ai flOi (15) 2,1T Where: V' is the position and angle vector of the autonomous driving vehicle relative to the roadside LiDAR fir y2 is the position vector of the autonomous driving vehicle relative to the roadside LiDAR Tris the angle vector of the autonomous driving vehicle relative to the roadside LiDAR 15," Vi is the position and angle vector of the autonomous driving vehicle in a reference coordinate system.

*ct C\I

CO O O C\1

D. Transformation Transform the roadside LiDAR point cloud D,. to the coordinate system driving vehicle according to the following formula: Xhdar lidar lidor R. 3 1-3\1.

H n4 4 R = RA, (a1)R, (nR_(7') of the autonomous (16) (17) (18) T= [x' y zi (19) Where: Hn is the transformation matrix from the roadside LiDAR coordinate system to the autonomous driving vehicle coordinate system; xego, y040, and.zego are the coordinates of a point in the roadside LiDAR point cloud after being transformed into the autonomous driving vehicle coordinate system. The corresponding coordinates in the roadside LiDAR coordinate system are [2cmca. y,," z,"".] 0 is the perspective transformation vector. Since there is no perspective transformation in this scene, 0 is set to [0 0 01 E. Voxelization C\I

CO O O C\1

Voxels are the abbreviation for Volume Pixels, which are the smallest unit of digital data segmentation in three-dimensional space. Conceptually similar to pixels, which are the smallest unit in two-dimensional space. By segmenting point cloud data using voxels, data features can be calculated separately for each voxel and the collection of point cloud data within each voxel is called a voxel-level feature. A large class of existing 3D object detection algorithms process LiDAR point cloud data based on voxel-level features. After voxelizing the point cloud data and extracting voxellevel features, these algorithms input them into subsequent 3D object detection network models based on such features to obtain object detection results.

The steps for voxelizing point cloud data are as follows: E,) Based on the spatial dimensions [D W H] of the onboard LiDAR point cloud D design a voxel size [Dr; , and divide the onboard LiDAR point cloud into voxels according to the designed voxel size.

Ez) For the rotated point cloud D;, use the same voxel division method as that used for the onboard LiDAR point cloud DE. to ensure that the spatial grids of the rotated point cloud completely overlap with those of the onboard LiDAR point cloud Dc. For example, if the distribution space of the onboard LiDAR point cloud pc is [-311n,33/n] in X -axis direction and its voxel is 4m, then if the distribution space of the rotated point cloud D: is [-32n 34m], it should be expanded to [-35m,37nt] to obtain an expanded rotated point cloud D; so that their voxel division grids are consistent. The specific calculation formula is as follows: har _slur! Dinh, _end Winn.. _end H start H Igicw end Khaata = Kege? stow nIVIC Kedah start K",,,, _I = K ego end 1 -nTr > K 2 K -lzdar end n n EN 1, 2 Where: Sego is the spatial range of the onboard LiDAR point cloud Dc: Shdal is the spatial range of the expanded transformed point cloud D;; Krdarsdar l and Khaw e"d are the starting and ending values of the expanded transformed, point cloud D: on K dimension; Kmar ctart and Kmar_end are the starting and ending values of transformed point cloud Drt on K dimension; S = ego hdaz Dego _slur! ego _end lQrH ego end _ (20) (21) (22) C\I

CO O O C\1

Kes, ",,, and Kex" en, are the starting and ending values of onboard LiDAR point cloud D, on K dimension; V. is the size of voxel in K dimension.

E3) Grouping is done based on the voxels where scattered data from the point cloud De of the onboard LiDAR and the expanded transformed point cloud Dr6 are located. The scattered data in the same voxel belong to the same group. Due to unevenness and sparsity of points, there may not be an equal number of scattered data in each voxel, and some voxels may have no scattered data.

E4) To reduce computational burden and eliminate discrimination problems caused by inconsistent density, random sampling is performed for voxels with a scatter data volume greater than a certain threshold value. It is recommended that this threshold value be set at 35. When there are fewer scatter data in point cloud data, it can be appropriately reduced. This strategy can save computing resources and reduce the imbalance between voxels.

Through steps Em El, the voxelized onboard LiDAR point cloud De becomes voxelized onboard LiDAR point cloud Ti: , while the voxelized expanded transformed point cloud D: becomes voxelized transformed point cloud D,y.

F. Voxel-Level Feature Calculation Depending on the target detection model used by autonomous driving vehicles, the method for calculating point cloud voxel-level features may vary. Taking the example of using VoxelNet model for object detection in autonomous driving vehicles, the steps are as follows: I) First, organize the voxelized point cloud. For the i -th point in voxel A, its collected raw data is: zi rj (23) Where: y" z, are the X, Y, Z coordinates of the -th point; ri is the reflection intensity of the i -th point.

(,2. Then calculate the average of all point coordinates within that voxel, and denote it as EV V V 1 (:$:, Afterwards, supplement the information of the i -th point by using its offset relative to the center: Where: a, is the information of the i -th point after supplementation; 4) The processed voxelized point cloud is input into the cascaded continuous VFE layers. The schematic diagram of the VFE layer processing voxelized point cloud data is shown in Figure 5. The processing logic of the VFE layer first allows each j), to obtain point-level features through a fully connected network, and then performs max-pooling on the point-level features to obtain voxel-level features. Finally, the voxel-level features are concatenated with the previously obtained point-level features to obtain point concatenation feature results.

(5) After being processed by cascaded continuous VFE layers, the final voxel-level feature is obtained by integrating and max-pooling through a fully connected layer. Each voxel-level feature is a 1 x C dimensional vector.

The above method can be used to obtain voxel-level features DJ for the voxelized point cloud DX of the onboard LiDAR and voxel-level features for the transformed point cloud D. G. Voxel-Level Point Cloud Feature Transmission Due to the sparse existence of point clouds in space, many voxels have no scattered data and therefore no corresponding voxel-level features. Storing point cloud voxel-level features using a special structure can greatly compress the data size and reduce the difficulty of transmission when sending it to processing devices, that is, compressing the point cloud voxel-level features. One of the special structures that can be used is a hash table, which is a data structure that directly accesses based on key code values. It speeds up searching by mapping key code values to a position in the table for access. The hash key of the hash table is the spatial coordinates of voxels, and its corresponding value is a voxel-level feature.

When using sub-scheme I, the subsequent processing is performed on the roadside computing device.

Gi) The autonomous driving vehicle compresses the voxel-level features /Yr' of the onboard LiDAR point cloud and obtains compressed voxel-level features Di of the onboard LiDAR point cloud, which are then transmitted to the roadside computing device. The roadside computing device receives compressed voxel-level features Dt of the onboard LiDAR point cloud and restores them to voxel-level features Di of the onboard LiDAR point cloud.

When using sub-scheme II, subsequent processing is performed on the autonomous driving vehicle.

Go The roadside computing device compresses the voxel-level feature Di of transformed point clouds and obtains compressed voxel-level feature D;' of transformed point clouds, which are then CO transmitted to an autonomous driving vehicle; The autonomous driving vehicle receives compressed voxel-level feature ±);" of transformed point clouds and restores it to voxel-level feature D' of O transformed point clouds.

C\I When using sub-scheme III, subsequent processing is performed in the cloud.

G3) The autonomous driving vehicle compresses its own LiDAR's voxel-level feature Di into compressed voxel-level feature Defi and transmits it to the cloud; Roadside computing devices compress their own transformed points' voxel-level feature Di into compressed voxel-level feature D and transmit it to Cloud as well; Cloud receives both compressed voxel-level features Dr' & Deft, restores them respectively back into original form (D;( -> D/; Die -> H. Data Concatenation and Aggregation Data concatenation operation is performed by aligning the voxel-level features Di of the onboard LiDAR point cloud and the voxel-level features Dyf of the transformed point cloud according to their positions in the coordinate system of an autonomous driving vehicle.

Data aggregation operation is performed by taking one side's voxel-level feature as the aggregated feature for any position where either onboard LiDAR point cloud or transformed point cloud has empty voxels. For voxels that are not empty on both sides, the aggregated voxel-level feature is calculated according to the following formula: Du qi;L. LS {lego_k, ficiar _k le go _k Where: idar k > Lgo_k (25) (26) D.: is the aggregated voxel-level feature; f. is the value of aggregated voxel-level feature nal at position k; feg",, is the voxel-level feature of the point cloud from onboard LiDAR, with value D at position k; is the voxel-level feature of transformed point cloud, with value at position k.

Aggregating features of the same coordinate voxel using the maximum pooling method.

When using sub-scheme I, post-processing is performed on the roadside computing device.

Hi) The roadside computing device uses the above method to concatenate and aggregate voxel-level features Def and Drf of the onboard LiDAR point cloud to obtain aggregated voxel-level feature D[ When using sub-scheme II, post-processing is performed on the autonomous driving vehicle.

H2) The autonomous driving vehicle uses the above method to concatenate and aggregate voxel-level features D: and of of the onboard LiDAR point cloud to obtain aggregated voxel-level feature D. When using sub-scheme Ill, post-processing is performed in the cloud.

H,) The cloud uses the above method to concatenate and aggregate voxel-level features Di and of of the onboard LiDAR point cloud to obtain aggregated voxel-level feature D;21.

I. Object Detection By inputting the aggregated voxel-level features into a subsequent 3D object detection network model, the detection targets can be obtained. Taking VoxelNet as an example, after obtaining the aggregated voxel-level features, they are inputted into a 3D object detection network model based 00 on voxel-level features to obtain the object detection results.

O The object detection results can be represented as U and are specific to: U =[u1 lc] (27)

CD

C\I 11 = [x, y, C, W, D, H, co, v (28) Where: u is the information of the i-th target in the object detection result; x, is the x-axis coordinate of the i-th detected target in the autonomous driving vehicle coordinate system; y, is the y-axis coordinate of the i-th detected target in the autonomous driving vehicle coordinate system; z, is the z-axis coordinate of the i-th detected target in the autonomous driving vehicle coordinate system; C., is the confidence level of detecting ith object; PY; is the width of the detection box corresponding to ith detected object; D, is the length of the detection box corresponding to ith detected object; H, is the height of the detection box corresponding to ith detected object; qt, represents the orientation angle for the detecting box corresponding to ith detected object; v, represents projection on the x-axis direction for motion speed related to i -th detecting objective within an autonomous driving car's coordinates system.

represents projection on the y-axis direction for motion speed related to i -th detecting objective within an autonomous driving car's coordinates system.

17, represents projection on the z-axis direction for motion speed related to i -th detecting objective within the autonomous driving car's coordinates system.

For any 3D object detection network model based on voxel-level features, the detection results should at least include the position of the target, i.e., , y,, z;. For high-performance 3D object detection network models based on voxel-level features, the detection results should include some or all of the attributes C, , W, D,, , vw and 17= of the detected targets. Among them, attributes 14; , D,, H, can only exist simultaneously or not exist simultaneously in the detection results. Attributes vx, , vr, , vs, can only exist simultaneously or not exist simultaneously in the detection results.

When using sub-scheme I, target detection is performed on the roadside computing device.

11) The roadside computing device inputs aggregated voxel-level feature D*af into a 3D object detection network model based on voxel-level features to obtain the target detection result U, and transmits the target detection result to the autonomous driving vehicle.

When using sub-scheme II, target detection is performed on the autonomous driving vehicle.

12) The autonomous driving vehicle inputs aggregated voxel-level feature t)-1 into a 3D object detection network model based on voxel-level features to obtain the target detection result U. When using sub-scheme Ill, target detection is performed in the cloud.

13) The cloud inputs aggregated voxel-level feature D: into a 3D object detection network model based on voxel-level features to obtain the target detection result, and transmits the target detection result U to the autonomous driving vehicle.

CO The present invention has technical key points and advantages including: Using roadside LiDAR as a supplement to the perception of autonomous driving vehicles, O improves the range and accuracy of object recognition. At the same time, using voxel features as data C\J transmitted between vehicles and roads ensures that almost no original data information is lost while reducing bandwidth requirements.

The symbols and their meanings are summarized in the table above.

Symbol Meaning D Roadside LiDAR point cloud D' Transformed point cloud D" Expanded transformed point cloud D, Onboard LiDAR point cloud DT Voxelized transformed point cloud De" Voxelized onboard LiDAR point cloud Df Onboard LiDAR voxel-level features DI Transformed point cloud voxel-level features Df Compressed onboard LiDAR voxel-level features D fi Compressed transformed point cloud voxel-level features Di Aggregated voxel-level features U Target detection results HH H c Installation height of roadside LiDARs a ' b ' C\I

CO O O C\1

0; Vertical field of view angle of roadside LiDARs 0:, 0: The angle between the highest elevation beam and horizontal direction for roadside LiDARs / / - /' 1 Distance between adjacent installation poles for roadside LiDARs V, Extrinsic vector parameters for roadside LiDARs x X-coordinate in the reference coordinate system for roadside LiDARs Yr) Y-coordinate in the reference coordinate system for roadside LiDARs z Z-coordinate in the reference coordinate system for roadside LiDARs cx, Rotation angle around X-axis in the reference coordinate system for roadside LiDARs ign Rotation angle around Y-axis in the reference coordinate system for roadside LiDARs ro Rotation angle around Z-axis in the reference coordinate system for roadside LiDARs x x hth, X coordinate of the point in roadside LiDAR coordinates Y or I d Y coordinate of the point in roadside LiDAR coordinates z 'd or Z coordinate of the point in roadside LiDAR coordinates X7 X coordinate of the point in reference coordinates yreal Y coordinate of the point in reference coordinates 7, "al Z coordinate of the point in reference coordinates R,(a") Sub-rotation matrix calculated based on extrinsic parameter a() Ry(60) Sub-rotation matrix calculated based on extrinsic parameter go R2(n) Sub-rotation matrix calculated based on extrinsic parameter x", E(R, T) Objective error function of ICP algorithm R Rotation transformation matrix T Translation transformation matrix A The coordinates of the i -th point in the target point set P q, The point in the source point set Q that forms the closest pair with point fe" Position and angle vector of roadside LiDAR relative to autonomous driving vehicles V Position and angle vector of autonomous driving vehicles in the reference coordinate system Tr Transformation matrix from the coordinate system of roadside LiDAR to that of autonomous driving vehicles xego X-coordinate of a point in the point cloud data from roadside LiDAR after being transformed into the coordinate system of autonomous driving vehicles 3",",,," Y-coordinate of a point in the point cloud data from roadside LiDAR after being transformed into the coordinate system of autonomous driving vehicles zego Z-coordinate of a point in the point cloud data from roadside LiDAR after C\I

CO O O C\1

being transformed into the coordinate system of autonomous driving vehicles D, W, H Spatial dimension size where onboard LiDAR obtains its point cloud data Dr,-, W,, Hi; Design the size of dimensions D, W, and H Sw The spatial range of the onboard LiDAR point cloud data to kST Irclal The spatial range of the expanded roadside LiDAR point cloud data K haat-slarl The starting value of the expanded transformed point cloud range in K dimension The ending value of the expanded transformed point cloud range in K Kliclar end dimension KMar start The starting value of the transformed point cloud range in K dimension K!ir k, end The ending value of the transformed point cloud range in K dimension V, The size of the voxel in K dimension ci, The raw data of i -th point in voxel A The X, Y, and Z coordinates of the i -th point in voxel P r, The reflection intensity of the i -th point inside voxel P v vz The mean value of the coordinates of all points within the voxel.

et; The data of the i -th point inside voxel A after additional information is provided fk The value of aggregated voxel-level feature Der at position k 1,,,, k The value of feature D{ at position k for voxel-level point cloud of the onboard LiDAR Aciar _k The value of voxel-level feature n: for the point cloud at position k The above nouns and their corresponding meanings are summarized in the following table: Noun Meaning Point cloud data The data detected by the LiDAR can be processed through transformation or voxelization to become point cloud data.

LiDAR Light Detection And Ranging devices installed roadside or onboard Roadside LiDAR LiDAR devices installed roadside.

Roadside computing The computing device corresponding to the roadside LiDAR.

device Extrinsic parameters of The position and angle of the roadside LiDAR in the reference roadside LiDAR coordinate system.

Autonomous driving An autonomous driving vehicle using this patent's vehicle-road vehicle cooperative solution.

Roadside LiDAR point The data detected by the roadside LiDAR belongs to point cloud cloud data.

Coordinate system of A coordinate system established based on an autonomous driving autonomous driving vehicle.

vehicle Transformed point cloud Point cloud data after deviating from roadside LiDAR to autonomous driving vehicle coordinate system.

C\I CO C\1 Onboard LiDAR Onboard LiDAR installed on an autonomous driving vehicle.

Onboard LiDAR point The data detected by onboard LiDAR belongs to point cloud data. cloud

Voxelized point cloud Point cloud data after three-dimensional segmentation using voxels for point cloud data Voxelized transformed Point cloud data after three-dimensional segmentation using voxels point cloud for deviated point clouds Voxelized onboard LiDAR point cloud Point cloud data after three-dimensional segmentation using voxels for onboard LiDAR point clouds Deep neural network Multiple layers of interconnected neurons, including feature extraction network and object detection network.

Feature extraction A deep neural network designed to automatically learn and extract network relevant features from point cloud data.

Object detection network A deep neural network designed to locate and identify objects within a point cloud Voxel-level features of Voxel-level features calculated based on voxelized onboard LiDAR onboard LiDAR point point clouds clouds Voxel-level features of Voxel-level features calculated based on voxelized deviated point transformed point clouds clouds Voxel-level features of Compressed voxel-level features compressed point clouds Scattered data One scattered dataset corresponds to one single dot in the Point Cloud Data set.

Data stitching Concatenation of voxel-level features from different sources according to their respective coordinates.

Data aggregation Aggregation calculation of voxel characteristics at the same coordinates after concatenating them together, so that each coordinate corresponds to only one characteristic feature.

Aggregated voxel-level Voxel-level feature obtained after concatenation and aggregation of features all relevant datasets.

Object detection results The output results of the 3D object detection network model include but are not limited to the position, size, angle, speed, and other characteristics of detected objects.

Localization data for Autonomous driving vehicles rely on positioning data obtained from autonomous driving sensors such as GPS and RTK.

vehicles

Attached Figure Brief Description

Figure 1 shows the flowchart of the proposed vehicle-road cooperative target detection method based on neural network feature sharing; Figure 2 shows a schematic diagram of a roadside mechanical rotating LiDAR deployment; Figure 3 shows a schematic diagram of a roadside solid-state LiDAR deployment; Figure 4 shows a schematic diagram of a roadside solid-state LiDAR deployment (two reverse solid-state LiDARs installed on the same pole); Figure 5 illustrates the processing of point cloud data by the VFE layer; Figure 6 illustrates voxelization feature extraction and aggregation; Figure 7 illustrates voxel point cloud object detection after merging; Figure 8 demonstrates coordinate transformation for roadside LiDAR point clouds; Figure 9 compares the results of target detection (the left image is from this patent's proposed vehicle-road cooperative detection method, and the right image is from directly taking their respective high-confidence target detection results).

Specific Implementation Method The following detailed description of the present invention is provided in conjunction with the accompanying drawings and specific implementation methods. The present invention relates to a collaborative target detection method based on neural network feature sharing. It can be roughly divided into three main steps: The first step is the installation and pre-calibration work of roadside LiDAR sensors. The layout of roadside LiDARs depends on the existing roadside pillar facilities and the type of installed LiDARs in vehicle-road cooperative scenes. Existing roadside LiDARs are installed in pole or crossbar form, specifically located on infrastructure pillars such as roadside gantries, streetlights, and signal light poles that have power support.

For intersection scenarios, it is sufficient to deploy a roadside LiDAR with a detection range greater than or equal to the scene range or containing key areas within the scene. For long-distance large-scale complex scenes such as expressways, highways, and parks, it is recommended to follow the guidelines for deploying roadside LiDARs in this invention so that their coverage area meets full coverage requirements for scenes; i.e., a single road-side LiDAR supplements blind spots under other road-side LiDARs within its coverage area to achieve better vehicle-road cooperative target detection CO results. In vehicle-road cooperation schemes, using roadside LiDARs enhances autonomous driving vehicles' perception capabilities by obtaining information about surrounding targets relative to their O position and their category, size dimensions, traveling direction, etc. Therefore, the perception C\I capability of Roadside LiDARs themselves should also be as strong as possible including parameters such as radar line number and sampling frequency which should not be lower than those related parameters for onboard LiDARs. In addition, to compensate for shortcomings like easy occlusion by onboard LiDARs while achieving redundant sensing data, the sensing range of roadside LiDARs should ensure covering common areas where occlusion occurs while controlling unobstructed lines-of-sight without obstacles After completing the installation of the roadside LiDAR sensor, to calculate the relative pose between the roadside LiDAR and the onboard LiDAR, it is necessary to calibrate the installation position and angle of the roadside LiDAR, that is, extrinsic parameter calibration. This will obtain coordinate position parameters and angular attitude parameters of the LiDAR relative to a certain reference coordinate system. First, select at least 4 reflectivity feature points as control points within the detection area of the roadside LiDAR. Reflectivity feature points refer to points with significant differences in reflectivity compared to surrounding objects, such as traffic signs and license plates. The purpose of selecting reflectivity feature points as control points is to facilitate finding corresponding points between point cloud data and a coordinate in a reference coordinate system based on their positions and reflection intensity differences from other points quickly. Control points should be distributed discretely as much as possible. Under conditions allowed by the scene environment and meeting requirements including discrete distribution without any three control points being collinear, control point selection should aim for more rather than fewer for better calibration results. When selecting control point locations within range detected by roadside LiDARs, they should be located farther away from them if possible; usually, this distance should be greater than 50% of the maximum detection distance by LiDAR. If its difficult due to scene limitations when choosing control point locations at 50% or less than the maximum detection distance by LiDAR, increase the number of control points instead. Subsequently, high-precision measurement instruments such as handheld RTK were used to measure the precise coordinates of control points. The corresponding point coordinates were then found in the roadside LiDAR point cloud. When a high-precision map file of the LiDAR deployment scene is available, there is no need to use handheld RTK or other high-precision measuring instruments. Instead, the corresponding feature point coordinates can be directly found in the high-precision map. Finally, a 3D registration algorithm is used to calculate the optimal value of the LiDAR extrinsic parameters vector and its result is used as a calibration result. Commonly used 3D registration algorithms include ICP and NDT algorithms, among which the ICP algorithm is mainly used for LiDAR extrinsic parameter calibration problems. The basic principle of the ICP algorithm is to calculate the optimal matching extrinsic parameters between the target point set P (the coordinate set of control points in the roadside LiDAR coordinate system) and source point set Q (the coordinate set of control points in the reference coordinate system), so that error function can be minimized.

The method for calibrating roadside LiDAR extrinsic parameters here is not limited but it should ensure that calibration results contain three-dimensional world coordinates of sensors as well as pitch angle, yaw angle, and roll angle for subsequent steps involving point cloud transformation.

The second step is processing and feature extraction of LiDAR point cloud data at the vehicle and road ends. In the actual process of cooperative autonomous driving between vehicles and roads, CO first obtain real-time world coordinates, pitch angle, yaw angle, and roll angle of the vehicle based on the automatic driving's positioning module. Based on the calibration results of RTK positioning of O the vehicle and roadside LiDAR, calculate the relative pose of the automatic driving vehicle to roadside C\I LiDAR, and deflect roadside LiDAR point cloud data into the vehicle coordinate system.

According to the spatial dimension size where onboard LiDAR point clouds are located, design voxel sizes for partitioning onboard LiDARs into voxels. For transformed point clouds, use a voxel partition method that is consistent with that used for onboard LiDARs to ensure that spatial grids for partitioning transformed point clouds completely overlap those for onboard LiDARs. Group scattered data in each voxel according to their locations in both onboard LiDAR point clouds and expanded transformed ones; scattered data within one voxel belong to one group. Due to unevenness and sparsity among points, there may not be an equal number of scattered data in each voxel or some voxels may have no scattered data at all. To reduce computational burden while eliminating discrimination problems caused by inconsistent densities among voxels, randomly sample voxels whose amount exceeds a certain threshold (recommended value: 35) to save computing resources while reducing imbalances among voxels when there are few scattered data in a given set. Referring to Figure 6, the two sets of point cloud data are divided into several discrete voxels using a fixed-size lattice and then expanded. The voxelization method mentioned above is used to calculate the feature vectors of each voxel separately. Taking the VoxelNet network model, which is relatively classic in three-dimensional object detection algorithms, as an example, multiple consecutive VFE layers are used to extract feature vectors for each voxel. That is, the offset of each scattered data relative to the center within the voxel is used to supplement its spatial information. The processed voxelized point cloud data is input into a cascade of consecutive VFE layers. The schematic diagram of processing voxelized point cloud data by the VFE layer can be seen in Figure 5. The processing logic of the VFE layer first makes each expanded scattered data obtain point-level features through a fully connected network layer; then it performs max-pooling on these features to obtain voxel-level features; finally, it concatenates these with previously obtained point-level features and obtains concatenated results at point level. After being processed by cascaded consecutive VFE layers, final voxel-level features are obtained through integration and max-pooling via fully connected layers.

Due to the sparse existence of point clouds in space, many voxels have no scattered data and therefore no corresponding voxel-level features. Storing voxel-level features of point clouds using a special structure can greatly compress the data size and reduce the difficulty of transmission when sending it to processing devices. One such special structure is a hash table, which is a data structure that directly accesses data based on key values. It speeds up searching by mapping key values to positions in the table for accessing records. In this case, the hash key of the hash table is the spatial coordinates of voxels, while its corresponding value represents voxel-level features.

The third step is to aggregate the voxel-level features and transformed voxel-level features of the onboard LiDAR point cloud data to obtain aggregated voxel-level features and perform target detection.

Before performing data aggregation and data stitching, it is necessary to compress the voxellevel features of the point cloud and transmit them to a computing device. The computing device can be a roadside computing device, an autonomous driving vehicle, or a cloud. When using sub-scheme I, data aggregation, data stitching, and subsequent processing are performed on the roadside computing device; when using sub-scheme II, data aggregation, data stitching, and subsequent processing are performed on the autonomous driving vehicle. When using sub-scheme III, data CO aggregation, data stitching, and subsequent processing are performed in the cloud.

During the process of data stitching and data aggregation, voxelization does not change the O spatial relative position of point clouds. Therefore, it is still possible to supplement the voxel-level C\I features of onboard LiDAR point clouds based on the transformed point cloud voxel-level features in the previous step. This involves aligning the voxel-level features of onboard LiDAR point clouds and transformed point clouds according to their positions in the coordinate system of an autonomous driving vehicle. During data aggregation, if one side's voxels are empty at a certain position while the other side's voxels are not empty, then we take that non-empty side's voxel-level feature as our aggregated feature for that position. For two sets of vector characteristics with identical spatial coordinates in both datasets, we use the maximum value pooling method to aggregate them into a single vector characteristic. For non-overlapping vectors, we keep only those from non-empty voxels.

After inputting the aggregated voxel-level features into the subsequent 3D object detection network model, the detection targets are obtained. As shown in Figure 7, taking the VoxelNet network model as an example, the concatenated data is inputted into the continuous convolutional layer of the VoxelNet network model to obtain spatial feature maps. Finally, these feature maps are fed into the RPN (Region Proposal Network) of the VoxelNet network model to obtain the final object detection results.

The present invention has the following technical key points and advantages: Using roadside LiDAR as a supplement to autonomous driving vehicle perception improves the range and accuracy of object recognition. At the same time, using point cloud voxelization features as data transmitted between vehicles and roads ensures that almost no original data information is lost while reducing bandwidth requirements for data transmission.

At the intersection of the School of Transportation Engineering at Tongji University's Wading campus, an experimental scene is set up. In this scene, there are poles with a height of 6.4m every 20 meters on the road section. Innovusion Jaguar array-type 300-line LiDAR and Ouster 128-line 360° LiDAR are used as roadside LiDARs in this experiment. The vertical field of view angle of Innovusion Jaguar array-type 300-line LiDAR is 40°, and its maximum detection distance is 200m. The vertical field of view angle for Ouster's 128-line,360-degree LiDAR is 45 degrees, and its maximum detection distance is 140 m. Autonomous driving vehicles use Ouster's 64-line 360-degree radar as their onboard LiDAR which has been installed horizontally at a height of 2m. The onboard LiDAR and the vehicle body are rigidly connected, and their relative posture and displacement remain unchanged. It has been calibrated during factory production, and the position and angle correction can be made in real-time based on RTK measurements obtained from the vehicle when it moves.

Implementation example 1 is as follows: (1) Deployment and calibration of roadside LiDAR sensors Only Ouster 128-line 360° LiDAR is used, considering the size of the LiDAR itself. The installation height of Ouster 128-line 360° LiDAR is set to be 6.5m, with one installed every five poles, which meets the guidelines for deployment of roadside mechanical rotating and mixed solid-state LiDARs.

Six reflectivity feature points are selected as control points within the area covered by the LiDAR. These six control points are located at the base of two poles on each side at distances of 80m, 100m, and 120m from the pole where the LiDAR is installed. Since there is a certain curvature in this section of the road, any three control points satisfy non-collinear conditions. The precise coordinates of these control points are measured using handheld RTK devices and matched with corresponding coordinates in point clouds obtained by the LiDAR sensor using the ICP algorithm for calibration.

CO (2) Processing and feature extraction of point cloud data.

After the calibration work in (1), the position of the roadside LiDAR point cloud in the coordinate O system of the autonomous driving vehicle can be obtained, as shown in Figure 8. The roadside LiDAR C\I point cloud is aligned with the coordinate system of the autonomous driving vehicle. The deviated point cloud is divided into voxels according to a fixed size grid [0.4 m 0.4 m 0.5 m] and expanded, resulting in voxelized deviated point clouds. After supplementing each scattered data with voxel mean information within voxelized deviated point clouds, voxel-level features are calculated by inputting them into multi-layer VFEs. For voxels that do not contain scattered data, no calculation is needed, and each voxel is finally represented by a 128-dimensional feature vector. The computed voxel-level features are stored in a hash table by roadside computing devices, where spatial positions of voxels serve as hash keys and corresponding contents represent their respective voxel-level features, resulting in compressed deviated point cloud voxel-level features. Similarly, onboard LiDAR point clouds are processed until obtaining onboard LiDAR's compressed voxel-level features without establishing a hash table for its data unlike roadside processing does above mentioned steps. At this time, compared with the original raw point cloud data, the size of the data has been reduced to about one-tenth approximately.

(3) Data concatenation, data aggregation, and target detection of voxel-level features Autonomous driving vehicles receive compressed transformed point cloud voxel-level features sent by roadside computing devices and restore them to transformed point cloud voxel-level features. Since the coordinate system of the received transformed point cloud voxel-level feature has been rotated to the coordinate system of the autonomous driving vehicle, it can be directly concatenated with the same coordinate system's onboard LiDAR point cloud voxel-level feature data. The method of maximum value pooling is used for data aggregation operation on voxels with identical coordinates. For example, if we have two sets of voxel-level features [15, 45, 90,... ,17] and [8, 17, 110,... ,43], their aggregated result would be [15, 45, 110,... ,43]. After completing all data concatenation and data aggregation operations on all voxel-level features they are input into subsequent RPNs to obtain target detection results. The proposed vehicle-road collaborative detection method and direct fusion-based onboard LiDAR point cloud and roadside LiDAR point cloud target detection results along with confidence levels are plotted in a bird's eye view as shown in Figure 9. It can be seen that sharing neural network characteristics for vehicle-road collaborative target detection can significantly improve accuracy while reducing bandwidth requirements for data transmission.

Implementation example 2 is as follows: (1) Deployment and calibration of roadside LiDAR sensors When only using the Innovusion Jaguar array-type 300-line LiDAR and setting only one LiDAR per pole, the installation height of the LiDAR is 6.5m, with a tilt angle of 7°, and one is installed every eight poles. This meets the guidelines for deploying solid-state LiDARs on roadsides.

Six reflectivity feature points are selected within the LiDAR area as control points. The six control points are located at the base of two adjacent poles on both sides of the road at distances of 100m, 120m, and 140m from the pole where the LiDAR is installed. Since there is some curvature in this section of the road, any three control points satisfy non-collinear conditions. The precise coordinates of these control points are measured using a handheld RTK device and matched to their corresponding coordinates in the point cloud data obtained by the LiDAR sensor. Finally, an ICP algorithm is used to calibrate the LiDAR.

(2) Processing and feature extraction of point cloud data.

CO Following the steps in Example 1, obtain voxel-level features for the deviated point cloud and onboard LiDAR point cloud. The calculated voxel-level features of the onboard LiDAR point cloud are O stored in a hash table by using the spatial position of each voxel as a hash key, with the corresponding C\I content being the respective voxel-level feature. This results in compressed voxel-level features for the onboard LiDAR point cloud that can be used by autonomous driving vehicles.

(3) Data concatenation, data aggregation, and target detection of voxel-level features The roadside computing device receives compressed voxel-level features of the onboard LiDAR point cloud sent by the autonomous driving vehicle and decompresses them to restore the voxellevel features of the onboard LiDAR point cloud. The subsequent steps of data concatenation, data aggregation, and target detection are the same as those in Example 1 (3). After obtaining the target detection results, the roadside computing device sends them to the autonomous driving vehicle.

Implementation example 3 is as follows: (1) Deployment and calibration of roadside LiDAR sensors When only using the Innovusion Jaguar array-type 300-line LiDAR and setting two reverse LiDARs on each pole, the installation height of the LiDAR is 6.5m, with a tilt angle of 7°, and two are installed between every nine poles, which complies with the guidelines for deployment of roadside solid-state LiDAR.

Six reflectivity feature points are selected within the area covered by the LiDAR as control points. The six control points are located at both sides of poles at distances of 100m, 120m, and 140m from the pole where the LiDAR is installed. Since there is some curvature in this section of the road, any three control points satisfy non-collinear conditions. The precise coordinates of these control points are measured using handheld RTK devices and matched to corresponding coordinates in point clouds obtained by LiDAR. The ICP algorithm is used to calibrate the LiDAR.

(2) Processing and feature extraction of point cloud data.

Obtain compressed transformed voxel-level features in step (2) of Example 1, and obtain compressed onboard LiDAR point cloud voxel-level features in step (2) of Example 2.

(3) Data concatenation, aggregation, and object detection of voxel-level features The cloud receives the compressed onboard LiDAR point cloud voxel-level features sent by the autonomous driving vehicle and restores them to the onboard LiDAR point cloud voxel-level features. The cloud also receives the compressed transformed voxel-level features sent by roadside computing devices and restores them to transformed voxel-level features. Subsequent steps for data concatenation, aggregation, and object detection are the same as those in step (3) of Example 1 until obtaining object detection results. The cloud then sends these results to the autonomous driving vehicle.

The above is only the preferred embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any skilled person in this technical field can easily conceive changes or substitutions within the technical scope disclosed by the present invention, which should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention shall be determined by what is claimed. C\I

CO O O C\1

Claims

Claims 1. A method of infrastructure-augmented cooperative perception for autonomous vehicles based on voxel feature aggregation, comprising the following steps: Preparation stage: A. Install a roadside LiDAR and configure the corresponding roadside computing device for said roadside LiDAR; B. Calibrate the extrinsic parameters of said roadside LiDAR; when calibrating the extrinsic parameters of the roadside LiDAR, the number, spatial distribution, and collinearity of control points are considered when selecting feature points within the scanning area of the roadside Li DAR as control points; using the ICP algorithm to calculate the extrinsic parameters by taking coordinates of control points in the roadside LiDAR coordinate system and RTK-measured reference coordinate system as target point set and source point set respectively; Application stage: C. Said roadside computing device calculates the relative pose of an autonomous driving vehicle concerning said roadside LiDAR based on its positioning data and extrinsic parameters; D. Said roadside computing device converts the point cloud detected by said roadside LiDAR into a coordinate system aligned with that of the said autonomous driving vehicle using its relative pose, obtaining transformed point clouds; E. Said transformed point cloud is voxelized by said roadside computing device to obtain a voxelized transformed point cloud, while a voxelized point cloud is also obtained from onboard Li DAR data processed similarly; F. Said roadside computing device calculates the voxel-level features of said voxelized transformed point cloud through a feature extraction network and obtains the voxel-level features of the transformed point cloud; said autonomous driving vehicle calculates the voxellevel features of the onboard LiDAR point cloud through said feature extraction networks, and obtains the voxel-level features of the onboard LiDAR point cloud; G. Said roadside computing device compresses and processes the voxel-level features of transformed point clouds to obtain compressed voxel-level features, which are then transmitted to autonomous driving vehicles; autonomous driving vehicles receive compressed voxel-level features and restore them to transformed point cloud's original voxel level feature; H. Autonomous driving vehicles perform data stitching and aggregation on both onboard LiDAR point cloud's voxel-level feature and transformed point cloud's voxel-level feature to obtain aggregated voxel-level feature; I. Autonomous driving vehicles input aggregated voxel-level features into a 3D object detection network model based on voxel-level characteristics to obtain object detection results.
2. A method as claimed in claim 1, characterized in that the configuration criteria for roadside LiDAR are: (I: For the case of installing mechanical rotating LiDAR and two reverse solid-state Li DARs on the same pole on the roadside, at least should meet: H > tan(t9) Where: I/ represents the installation height of the LiDAR; B2 represents the angle between the highest elevation beam of the LiDAR and the horizontal direction; L represents the distance between adjacent mounting poles for LiDARs; For roadside installation of roadside solid-state LiDAR, the following requirements should be met at least: He tan(-rt --0;-)H, tan(6) 2 Where: HI, represents the installation height of the roadside solid-state LiDAR; 01; represents the vertical field of view angle of the roadside solid-state LiDAR; O represents the angle between the highest elevation beam of the roadside solid-state LiDAR and the horizontal direction; Lb represents the distance between adjacent mounting poles for roadside solid-state LiDARs.
3. A method as claimed in claim 1, characterized in that the transformed point cloud is expanded during the point cloud voxelization process to ensure that the voxel partition grids of the onboard LiDAR point cloud D, and the expanded transformed point cloud 1),", are consistent, with a calculation formula of: Kiav sr; -K ego A thrt { 11 V 1( K I law,tar t K hdar end = K ego end- -n2I"k' > K?friar end {1,, n, EN Where: K tau ' K Ininr end are the starting and ending values of the expanded transformed point cloud De in K dimensions; K, K, " K are the starting and ending values of the transformed point cloud Di, in K 71chderr_,Iid dimensions; VA. is the size of voxels in K dimension.
4. A method as claimed in claim 1, characterized in that when extracting voxel-level features of point clouds, the information of points is supplemented by using the offset from the center.= [xi yr z, - -v, . -Where: ii is the information of the i -th point in voxel A after supplementation; x, , y,, z, are the coordinates of the i -th point in voxel A; ri is the reflection intensity of the i -th point in voxel A; v it are the mean values of all points' coordinates within voxel A.
5. A method as claimed in claim 1, characterized in that the voxel-level feature data aggregation method uses a maximum value pooling method to aggregate voxel-level features with the same coordinates, which is expressed by the following formula: = 1 ego _k f idar k.fidar lr -<J e go ic.flidar k> f Jego Where: ft, is the value of aggregated voxel-level feature Drf at position k; fego_, is the value of vehicular LiDAR point cloud voxel-level feature n: at position k; is the value of transformed point cloud voxel-level feature Rf at position k.