This is basic interview question. I have tried to make my explanation very short and understandable, if you find that i wasn’t able to make you understand nicely, kindly , let me know.
The answer will be “yes” , it does. It contributes indirectly, like in Feed Forward propagation, the activation function are applied directly but in Backward propagation, we go from layer N to layer (N-1), layer (N-1) to layer (N-2)… so on and in this we try to find new weight , which takes the differential of activation function like sigmoid or loss function to optimize new weight . So, yes it contributes in terms of calculation.