With the availability of very high resolution (VHR) satellite images fast and relatively cheap investigations of large urban areas in comparison to aerial photography are possible. In urban areas a predominant use of satellite data is the generation of city models for applications like mobile phone signal propagation or flooding and catastrophe simulation. Since more and more large and quick growing cities emerge in developing countries, monitoring and modeling of these areas from satellite will be the cheapest if not the only possibility. Most of the actual methods used for the generation of city models depend on mainly manual work. A method for automatic derivation of -in a first step very coarse -models of urban environments will be of great use. In this paper a production chain and the methods used for such a automatic modeling is presented. The method is based on stereo images from VHR satellite stereo imagery provided, e.g., by IKONOS or QuickBird. In a first step a digital surface model (DSM) is derived from the stereo data. Subsequently a digital terrain model (DTM) and ortho images are created. Based on the local height differences between DSM and DTM and the normalized difference vegetation index (NDVI) a coarse classification can be made. Upon this classification object models can be selected and object parameters can be adapted to create an object-based representation of the satellite image scene. The method used is evaluated and the results are discussed.