In this paper we propose a system for enabling web multimodal interactions using web 2.0 and VoIP techologies, and a possible architecture to allow a smooth evolution from traditional multi-channel to multimodal web 2.0 applications. The solution we propose is based on a mix of telecommunication and web technologies, which allows a synergic multimodal interaction using web pages and multimodal agents, eventually running on external mobile devices .We exploit the key ideas of the web 2.0: all inputs are processed on the server, especially audio signals and results are rendered selecting the best mode combination.