In human-human joint assembly work, two human workers complete the assembly of a toy-car by inserting a green wheel and a blue wheel into its wheelbases together. At this time, the green wheel should first be inserted, after which the blue wheel is allowed to be inserted into tis wheelbase. In this work, we propose a learning method for a robot to interact with a human worker instead of a human worker. That is, a human worker is replaced by a robot. In this human-robot joint assembly work, the robot has to complete the assembly of a toy-car with the rest of two humans. For this, a unified framework is used with the following four processes; the process (i) - of segmenting motion trajectories using a Gaussian Mixture Model (GMM) [1], the process (ii) - of modeling motion primitives; here, such motion primitives are represented as Dynamic Movement Primitives (DMPs), the process (iii) - of learning motion causalities; To find pre- and post-conditions for a task execution, we use to find what a robot should direct its attention in motion trajectories from motion trajectories of a robot and to find what a robot should direct its attention in motion trajectories from all possible object-object motion pairs and object-robot motion pairs. Here, pre- and postconditions indicate what have to be checked to activate motion primitives and what have been changed after executing motion primitives, respectively. To obtain pre- and post-conditions, significant variables are selected based on spatial entropies of all motion pairs. These motion causalities are represented Bayesian networks including significant variables. Finally, the process (iv) - of selecting motion primitives according to current and goal situations by using the motivation graph proposed in [2]. To evaluate our proposed method, a toy-car assembling task is performed by inserting a green wheel and a blue wheel into their wheelbases.