Application of machine learning techniques in designing dialogue strategies is a growing research area. Most of the reinforcement learning methods use tabular representation to learn the value of taking an action from each possible state in order to maximize the total reward. For large state spaces, several difficulties are to be faced like large tables, an account of prior knowledge, and data sparsity. This paper investigates the performance of online policy iterative reinforcement learning automata approach that handles large state space by hierarchical organization of automaton to learn optimal dialogue strategy. The results were compared with flat reinforcement learning methods and the results shows that the proposed method has faster learning and scalability to larger problems.