The Fisher Vector (FV) representation of images can be seen as an extension of the popular bag-of-visual word (BOV). Both of them are based on an intermediate representation, the visual vocabulary built in the low level feature space. If a probability density function (in our case a Gaussian Mixture Model) is used to model the visual vocabulary, we can compute the gradient of the log likelihood with respect to the parameters of the model to represent an image. The Fisher Vector is the concatenation of these partial derivatives and describes in which direction the parameters of the model should be modified to best fit the data. This representation has the advantage to give similar or even better classification performance than BOV obtained with supervised visual vocabularies, being at the same time class independent. This latter property allows its usage both in supervised (categorization, semantic image segmentation) and unsupervised tasks (clustering, retrieval). In this paper we will show how it was successfully applied to these problems achieving state-of-the-art performances.