Regression is about returning a number instead of a class, in our case we're going to return 4 numbers (x0,y0,width,height) that are related to a bounding box. You train this system with an image an a ground truth bounding box, and use L2 distance to calculate the loss between the predicted bounding box and the ground truth.