In this paper, finite-horizon optimal control design for affine nonlinear discrete-time systems with totally unknown system dynamics is presented. First, a novel neural network (NN)-based identifier is utilized to learn the control coefficient matrix. This identifier is used together with the action-critic-based scheme to learn the time-varying solution, or referred to as value function, of the Hamilton-Jacobi-Bellman (HJB) equation in an online and forward in time manner. To handle the time varying nature of the value function, NNs with constant weights and time-varying activation functions are considered. To satisfy the terminal constraint, an additional term is added to the novel updating law. The uniformly ultimately boundedness of the closed-loop system is demonstrated by using standard Lyapunov theory. The effectiveness of the proposed method is verified by simulation results.