In this paper, we study several versions of optimization problems related to haplotype reconstruction/identification. The input to the first problem is a set C 1 of haplotypes, a set C 2 of haplotypes, and a set G of genotypes. The objective is to select the minimum number of haplotypes from C 2 so that together with haplotypes in C 1 they resolve all (or the maximum number of) genotypes in G. We show that this problem has a factor-O(logn) polynomial time approximation. We also show that this problem does not admit any approximation with a factor better than O(logn) unless P=NP. For the corresponding reconstruction problem, i.e., when C 2 is not given, the same approximability results hold.
The other versions of the haplotype identification problem are based on single individual haplotyping, including the well-known Minimum Fragment Removal (MFR) and Minimum SNP Removal (MSR), which have both shown to be APX-hard previously. We show in this paper that MFR has a polynomial time O(logn)-factor approximation. We also consider Maximum Fragment Identification (MFI), which is the complementary version of MFR; and Maximum SNP Identification (MSI), which is the complementary version of MSR. We show that, for any positive constant ε< 1, neither MFI nor MSI has a factor-n 1 − ε polynomial time approximation algorithm unless P=NP.