Investigation of Audio Controlled Edge Semantic Segmentation System for Visually Impaired People

An investigation has been conducted into an audio-controlled edge semantic segmentation system aimed at improving assistive technology for visually impaired individuals. The multimodal system uses audio inputs to manage semantic segmentation processes. It employs distinct convolutional networks for visual semantic segmentation and audio keyword spotting, allowing users to select specific object categories via simple voice commands. This selection process dynamically filters visual information, which helps reduce cognitive load and enhances the display of important environmental details. The study analyzes the balance between segmentation accuracy and computational efficiency at different image resolutions, and it suggests several post-processing techniques to improve user experience. The system’s capability to operate on edge devices is demonstrated using the Nvidia Jetson AGX Orin platform.