This paper presents a system framework to realize camera localization relative to a 3D object. We adopt a Coarse-to-Fine strategy to locate the camera. First a series of rendered Computer Graphics (CG) images are compared with a real camera image. With an iterative linear search for the maximum Mutual Information (MI) among these comparisons, an initial camera pose estimation is obtained. Subsequently, for various kinds of objects, an update step of the pose estimation is realized by an edge-based object tracking method or a further search for the maximum MI with a smaller step size. To eliminate the pose estimation error due to the system coordinates transform error and camera calibration, a visual servo approach is adopted in the final step to adjust the camera to its final pose. The implementation of this framework is introduced in detail. The experiments results show that this framework is effective to locate the camera to its desired pose relative to 3D objects.