Fisher Vectors (FV) and Convolutional Neural Networks (CNN) are two image classification pipelines with different strengths. While CNNs have shown superior accuracy on a number of classification tasks, FV classifiers are typically less costly to train and evaluate. We propose a hybrid architecture that combines their strengths: the first unsupervised layers rely on the FV while the subsequent fully-connected supervised layers are trained with back-propagation. We show experimentally that this hybrid architecture significantly outperforms standard FV systems without incurring the high cost that comes with CNNs. We also derive competitive mid-level features from our architecture that are readily applicable to other class sets and even to new tasks.