This letter is concerned with a class of nonbinary low-density parity-check (LDPC) codes, referred to as column-scaled LDPC (CS-LDPC) codes, whose parity-check matrices have a property that each column is a scaled binary vector. The CS-LDPC codes, which include algebraically constructed nonbinary LDPC codes as subclasses, admit fast encoding and decoding algorithms. Specifically, for a code over the finite field \mathbb{F}_{2^p}, the encoder can be implemented with p parallel binary LDPC encoders followed by a series of bijective mappers, while the decoder can be implemented with an iterative decoder in which no message permutations are required during the iterations. In addition, there exist low-complexity iterative multistage decoders that can be utilized to trade off the performance against the complexity. Simulation results show that the performance degradation caused by the iterative multistage decoding algorithms is relevant to the code structure.