One epoch is when an ENTIRE dataset is passed forward and backward through the neural network only once.
Since one epoch is too big to feed to the compute at once, we divide it in several small batches.
Why more than one epoch?
Updating the weights with single pass or one epoch is not enough.
One epoch leads to under-fitting of the curve.
As the no. of epochs increases , more no. of times the weight are changed in the neural network and the curves goes from under-fitting to optimal to overfitting curve.
What is the right no. of epochs?
Many things we get through experience , this also, we can get through experience , depends on the no. of datasets , and many more factors.