A new model called Dual Channel Feature Fusion Network (DC-FFNet) has been developed to enhance the classification of steady-state visual evoked potentials (SSVEP) in brain-computer interface (BCI) systems. SSVEPs are known for their high accuracy and quick response, making them useful for controlling various devices. However, current classification methods struggle with accuracy and real-time performance, especially in complex environments. DC-FFNet improves upon these issues through a dual channel architecture that utilizes a multi-head self-attention mechanism to better capture and fuse global and local features. The model achieved classification accuracies of 91.80% on the SSVEP_SANDIEGO Dataset and 90.93% on a self-recorded dataset, surpassing previous models. Additionally, the real-time control framework, which integrates an asynchronous control mechanism, significantly reduces response times and boosts the system’s information transfer rate to 128.66 bits per minute. This advancement is anticipated to enhance SSVEP signal processing for multi-device control systems, benefiting individuals with disabilities by improving the balance between performance and real-time capabilities in BCI technology.