Skip to main content
As China’s autonomous driving penetration and commercialization pace accelerate, perception systems, as a key technology for achieving advanced autonomous driving, have garnered widespread industry attention and become the core battleground for measuring an autonomous driving company’s technological competitiveness.
As a leading company in autonomous vehicle commercialization, Autowise has been deeply engaged in the field of autonomous driving. Leveraging its full-stack self-developed autonomous driving technology and extensive experience across passenger cars, commercial vehicles, and specialized operational scenarios, the company has established a comprehensive and mature perception technology system. Autowise has independently developed an industry-leading BEV multi-task perception module, characterized by multi-tasking, multi-modality, and cross-temporal capabilities, along with a data closed-loop system that supports rapid BEV iteration.
一、Breakthroughs in Autowise’s BEV Perception Technology for Vertical Scenarios
In traditional autonomous perception technology stacks, 2D images are fed into the perception module to generate 2D perception results. These are then processed using sensor fusion techniques, integrating multi-camera 2D results and LiDAR data before passing the inferred results downstream. This approach relies heavily on manually defined rules and faces challenges such as occlusion, incomplete observations, and information loss.
To overcome these limitations, BEV technology unifies multi-sensor features into a single 3D space, reducing information loss and enabling direct end-to-end perception within the 3D environment. Based on this BEV approach, Autowise has developed a BEV multi-task perception system tailored for sanitation operations and their unique operational needs, featuring:
Multi-Task Processing for Complex Environments:
To address the highly variable demands of sanitation operations, the Autowise perception module supports not only common detection tasks found in traditional urban traffic scenarios (e.g., vehicles, pedestrians, traffic signs) but also introduces new tasks specific to sanitation environments, including: Low-height obstacle detection, Curb detection, Garbage detection, Dust and water mist recognition & filtering, Collision risk assessment. Compared to conventional models that use a separate neural network for each task, the BEV multi-task perception module employs a shared backbone network, reducing computation load by over 30% while ensuring real-time environmental awareness.
Multi-Modal Fusion for Precise, Real-Time Perception :
The BEV perception system integrates surround-view images and multi-LiDAR point clouds, enabling multi-modal, multi-sensor data aggregation to compensate for the weaknesses of single-modal perception. Autowise’s BEV perception achieves:
  • A 32.6% accuracy improvement over pure vision-based algorithms
  • An 18.9% accuracy improvement over single-LiDAR perception models
  • Scalability for integrating additional sensor data in the future
Cross-Temporal Perception for Accurate Velocity Prediction
The BEV perception system supports multi-frame LiDAR input and cross-temporal feature aggregation, allowing it to recover missing information over extended time periods. This significantly enhances perception accuracy and improves target velocity prediction, ensuring more precise motion planning.
The following real-world scenarios demonstrate how Autowise’s BEV multi-task perception technology efficiently addresses complex traffic challenges and long-tail problems.
二、Real-World Applications of BEV Multi-Task Perception
Scenario 1: Improving Low-Height Obstacle Detection for Autonomous Sanitation Vehicles
Low-height obstacle detection is a major challenge in autonomous sanitation operations due to:
  1. Long-tail distribution – Unlike common traffic participants (e.g., vehicles, pedestrians), low-height obstacles are diverse and sparse (e.g., pipes, stones, fallen shovels, potholes, thermos flasks).
  2. LiDAR limitations – The irregular size and shape of some obstacles make them difficult for LiDAR to consistently detect.
  3. Operational decision complexity – The system must precisely differentiate between obstacles that require avoidance vs. those that can be safely cleaned (e.g., thin branches can be swept, but large branches may clog cleaning equipment and necessitate rerouting).
To tackle these challenges, Autowise developed a 2D+3D multi-sensor detection system, primarily vision-based with LiDAR assistance. This system incorporates: Obstacle attribute prediction + occupancy prediction and Garbage detection technology.
These enhancements accurately estimate obstacle size and height, enabling intelligent decision-making on whether to reroute or proceed with cleaning operations.
Figure 2: Rosy red color block for low obstacle recognition result: black thin water pipe
Figure 3: Rose color block for low obstacle recognition result: black garbage bag
Figure 4: Rosy red color block for low obstacle recognition result:black garbage pile with unfolded plastic cloth bag
Figure 5: The rosy red color block is a low obstacle recognition result:not sweepable white trash pile
The above four pictures respectively show the situation of the self-driving vehicle when it encounters low obstacles, from Figure 2 to Figure 5, there are black thin water pipes, black bags, black garbage piles with unfolded plastic sheeting, and non-scavenging white garbage piles on the ground of its operation scenario, respectively.
It can be seen that when the self-driving vehicle is driving in the above four scenarios, the Occ occupying grid outputted by Autowise BEV perception module has different color blocks, and the positional accuracy and orientation of the low obstacle recognition is very accurate, which provides a strong guarantee for the self-driving vehicle to work safely and drive smoothly under the complex road environment.
Scenario 2: Advanced 3D Object Detection in Dense Traffic Environments
In congested urban environments, perception systems face challenges such as : Dense vehicle-pedestrian interactions, Occlusion and cluttered scenes, Unstructured road conditions, Rapid environmental changes.
Figure 6: Autowise BEV Sensing’s integrated 3D detection capability at busy intersections stably detects and dynamically predicts the many traffic participants passing by on a narrow road.
As shown in Fig. 6, in a busy traffic intersection, vehicles come from all directions, including vehicles traveling in the same direction, vehicles traveling in the opposite direction, vehicles preparing to make a turn, as well as a large number of pedestrians and bicycles occupying the lanes or crosswalks, etc., the Autowise BEV sensing system detects the obstacles around the vehicle, the positions, speeds, directions, and dimensions of pedestrians and bicycles in such an environment, and provides the autonomous The AutowiseBEV sensing system detects the position, speed, direction and size of obstacles, pedestrians and bicycles around the vehicle in such an environment, providing a full-range, 360-degree accurate sensing of the surrounding environment for the autonomous driving vehicle during driving operations. 

Scenario 3: BEV Curb Detection, High-Precision Location and Shape Recognition and Tracking
In sanitation operations, autonomous vehicles must perform edge cleaning, which requires accurate identification and tracking of curb locations and shapes to ensure cleaning accuracy and safety. However, real-world road environments are complex and variable, and curb recognition faces challenges due to: Varied curb shapes (e.g., straight, curved, angular); Obstructions on the curb (e.g., vegetation, low barriers). These factors can significantly affect the accuracy and robustness of curb detection.
To address these challenges, Autowise has integrated a BEV curb detection task into its BEV network. This system divides the curb into different grids, and uses deep learning networks to extract corresponding features from both images and LiDAR. It then fits the curb geometry within each grid and combines geometric features and map priorsto accurately identify curbs of different shapes. Additionally, temporal information is incorporated into the curb detection task, combining results from multiple frames and applying filtering to further enhance the stability of the detection.
This approach ensures that the vehicle can reliably track and follow the curb even in complex and dynamic road environments, enabling safe and precise cleaning operations.
Figure 7: BEV road edge detection, accurately detecting road edge curvature and position (red curve shows road edge detection results)
As shown in Figure 7, the Autowise BEV sensing system accurately displays the degree of curvature of the road edge in a curve, allowing the vehicle to select the appropriate speed based on the road edge information and ensuring a high sticker rate.
Scenario 4: 3D Semantic Segmentation for Effective Recognition of Water Mist and Dust in the Environment
Due to the unique characteristics of sanitation operations, autonomous vehicles often encounter water mist and dust generated during watering or driving. These particles, when detected by LiDAR, can be mistakenly identified as obstacles, interfering with normal operation and navigation. 

To address this, Autowise’s BEV perception system combines image data with LiDAR data to predict semantic information for each LiDAR point. This enables the system to effectively identify and filter out dust, water mist, and water splashed by other vehicles, ensuring that the vehicle’s perception remains accurate and reliable even in challenging environmental conditions.

 

By integrating 3D semantic segmentation, the system can differentiate between environmental disturbances and actual obstacles, enhancing the vehicle’s operational efficiency and safety in various weather and environmental conditions.
Figure 8: Before water mist treatment
Figure 9: After water mist treatment
As in the water sprinkling operation scene of the driverless vehicle in Fig. 8, the vehicle is surrounded by dense water mist, which can be mistakenly detected as an obstacle from the point cloud modality alone.The BEV sensing system accurately recognizes and rejects these noises generated by the water mist through the multi-modal point cloud semantic segmentation technique, which effectively reduces the impact of bad weather and water sprinkling operation on the sensing accuracy of the self-driving vehicle (Fig. 9).
三、Autowise Builds an Efficient Data Feedback Loop to Accelerate R&D Iteration and Achieve Cost Reduction and Efficiency Improvement
Figure 10: Data Closure Program
The data feedback loop has become a core strategy and key path for addressing the long-tail problem in autonomous driving. The ability to efficiently process large volumes of data and quickly optimize algorithm models is crucial for the rapid iteration of autonomous driving technology. Autowise’s autonomous vehicles have collectively traveled over 13 million kilometers worldwide. Based on this vast amount of data, Autowise has established a highly efficient data mining, labeling, simulation, and model iteration feedback loop (as shown in Figure 10), which facilitates the rapid iteration of autonomous driving technology, driving cost reduction and efficiency improvements.
  • Data Mining to Enrich Long-Tail Database
Figure 11: Scenario-specific data mining
Figure 11 shows an example of Autowise’s long-tailed scene data mining through a large model.Autowise builds a multimodal model based on text to image, so that when it needs to obtain a specific type of scene data, it can automatically mine the scene data that meets the conditions from the huge amount of historical data through the textual description in natural language, for example, “non-standing pedestrian”, which improves the efficiency of mining long-tailed scenes. When you need to get a specific type of scene data, just use the natural language text description, such as “non-standing pedestrians”, the scene data can be automatically mined from a large amount of historical data to meet the conditions, which improves the efficiency of mining long-tailed scenes.
  • Automatic Data Labeling to Save Costs and Improve Efficiency
For vast amounts of unlabeled data, Autowise enhances the utilization of such data through semi-supervised learning, data augmentation, and other methods. Additionally, Autowise has developed an automatic labeling system based on BEV multi-modal data, significantly increasing labeling efficiency, shortening the model iteration cycle, and drastically reducing labeling costs.
  • 2D and 3D Data Simulation to Improve BEV Perception Model Generalization
In the data simulation phase, Autowise combines 2D and 3D simulation to generate highly realistic long-tail scenarios. This enriches the training data for autonomous driving algorithms, offering a more comprehensive training and testing environment for the perception model.
Fig. 13: Example of 2D data simulation
An example of 2D data simulation is shown in Figure 13. For the long-tailed scenes encountered in real operations (as shown in Figure 13 left), Autowise partially semantically annotates such scenes through a large model or manually (as shown in Figure 13 middle), and then generates high-fidelity images from the Diffusion Model, in which the annotated obstacles, such as water pipes, barrels, traffic cones, etc., have a high level of semantic consistency with the original image, and the background and other unannotated obstacles, such as water pipes, barrels, traffic cones, etc., have high semantic consistency with the original image, and the unlabeled obstacles, such as the background, are also close to the real scene (e.g., Fig. 13 right). Through this multi-scene generation and training, Autowise’s self-driving vehicles can better adapt to various complex and unpredictable road environments, thus improving overall safety and reliability.
In the application practice of 3D data simulation technology, Autowise reconstructs the long-tailed scene through 3D Gaussian Splatting technology, thus generating a high-fidelity new perspective view of Corner Case, which significantly enriches the library of long-tailed scenes, and better supports the training of perceptual models and the testing of new scenes. 

Figure 14: Scene of a vehicle passing normally
Figure 15: New perspective data generated after scene modeling
Figure 14 presents a scenario recorded by a vehicle during normal driving, in which the long, thin, overhanging warning line on the left side of the picture is a rare element in the driving process, which poses a great challenge to the autonomous driving perception system.Autowise adopts 3D simulation technology to generate a new high-fidelity view of the obstacles including the warning line and other obstacles (see Fig. 15), especially when the overhanging warning line is in the critical viewpoint of the road, thus supporting the testing of the new scenario of “overhanging warning line on the road”, which effectively improves the generalization capability and robustness of the autonomous driving system when facing complex environments. In particular, the key viewpoint of the overhanging warning line appears in the forward road, thus supporting the test of the new scenario of “overhanging warning line appears in the forward road” and effectively improving the generalization ability and robustness of the automatic driving system in the face of complex environments. 

The above system shows the technical innovation and application examples achieved by Autowise in the field of automatic perception system. In the future, Autowise will continue to adhere to the technology path of “combining soft and hard”, actively explore the application scenarios of perception technology, share the latest research results, focus on the commercialization of automatic driving, and promote the innovation of automatic driving technology.