We propose a real-time algorithm for the generic classification of humans and objects in 3D scenes. The algorithm does not depend on color information and works with depth data alone, making it very flexible for a wide area of applications. Further, we will show that it is very resistant to occlusion and will give correct classification results even in cases, where only a fraction of a full human or object can be captured by the depth sensor. Opposed to current approaches based on deep networks, training the IRON-BAG classifier (a bag-of-words model for IRON-features) can be done within minutes, making it easier to add new object classes, to finetune parameters and to adapt it to new operational scenarios. The system is easy to use, as it does not impose any constraints on the objects to detect, e.g. there's no limitation regarding shape, height, orientation, or position of humans and objects — knowledge of the sensor-pose or ground plane is not required. Instead of using depth images or point clouds as inputs for our classification pipeline, we solely operate on the NormalDistribution-Transform-map (NDT-map) data structure. NDT-maps provide a highly memory-efficient representation of depth data, and we show that the information contained within them is sufficient to accurately classify humans and objects from real-world 3D scenes with a speed of around 180 classifications per second on a single CPU core.