A very common problem which affects the performance of bus-based computing systems arises from the fact that the bus is a common resource which needs to be shared between a number of master devices. The common resource contention forces to stall temporarily the execution of one or more of the bus masters, slowing down the execution. Moreover, the width of the bus is usually relatively small, forcing the bus master to perform several bus cycles in order to transfer a data block from the main memory to a peripheral (or to a processing element), and the other way around. The combination of these factors leads to problems and inefficiencies which designers need to solve. In this paper we present a dedicated hardware used to allow an external accelerator to access the system memory independently from the main microprocessor. The proposed device is able to exchange data with the memory in a DMA-like fashion, to generate properly memory addresses in order to access it in an efficient way. Results show that using such a solution it is possible to reach a considerable speed-up in the execution of a given algorithm.