With the advent of advanced remote sensing technologies in past few decades, acquiring higher resolution satellite images has become easier and cheaper in recent days. However, on the other hand, it has offered a big challenge to the remote sensing community in smart image interpretation from such huge volume of data. Deep learning, which offers efficient algorithms for extracting multiple levels of feature abstractions, may be suitable to serve the purpose. This letter presents a deep learning approach (Deep-STEP) for spatiotemporal prediction of satellite remote sensing data. The proposed learning architecture is derived from a deep stacking network, consisting of a stack of multilayer perceptron, each of which models the spatial feature of the associated region at a particular time instant. The proposed method has been demonstrated on normalized difference vegetation index (NDVI) data sets, derived from satellite remote sensing imagery, containing several thousands to millions of pixels/records. The experimental results (related to NDVI prediction) reveal that the proposed architecture exhibits fairly satisfactory performance with promising learning capabilities.