Today, satellite and aerial images are the major source of information for landcover classification. An important usage of remotely sensed data is extracting urban regions to update GIS databases. However, in most cases human resources do not give a sufficient solution to the problem, since it can not entirely process such an enormous amount of remotely sensed data. In addition, most of the automatic methods for urban extraction that exist today are sensitive to atmospheric and radiometric parameters of the acquired image. In this paper we address the problem of urban areas extraction by using a visual representation concept known as "Bag of Words". This method, originally developed for text retrieval approaches, has been successfully applied to scenery image classification tasks. In this paper we introduce the "Bag of Words" approach into analysis of aerial and satellite images. Due to the fact that we implement a normalization process in our method, it is robust to changes in atmospheric conditions during acquisition time. The improved performance of the proposed method is demonstrated on IKONOS images. To assess the robustness of our method, the learning and testing procedures are performed on two different and independent images.