Despite the simplicity of keyword-based matching, text retrieval sys-tems have achieved practical success in recent decades. Keywords, which exhibit meaningful semantics to users, can be extracted relatively easily from text docu-ments. In the case of visual contents which are perceptual in nature, the definition of corresponding “keywords” and automatic extraction are unclear and non-trivial. Is there a similar metaphor or mechanism for visual data? In this chapter, we propose a new notion of visual keywords which are abstracted and extracted from exem-plary visual tokens tokenized from visual documents in a visual content domain by soft computing techniques. Each visual keyword is represented as a neural network or a soft cluster center. A visual content is indexed by comparing its visual tokens against the learned visual keywords of which the soft presence of comparison are aggregated spatially via contextual domain knowledge. A coding scheme based on singular value decomposition, similar to latent semantic indexing for text retrieval, is also proposed to reduce dimensionality and noise. An empirical study on profes-sional natural scene photograph retrieval and categorization will be described to show the effectiveness and efficiency of visual keywords.