I know that loss function denotes the error for a single training example, and cost is for the entire training set. Also, people want to vectorize computation as much as possible for computational efficieny. And in the tensorflow docs they use loss notation: https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer
minimize( loss, global_step=None, var_list=None, gate_gradients=GATE_OP, aggregation_method=None, colocate_gradients_with_ops=False, name=None, grad_loss=None )
My confusion is that loss in the minimize function is the same as loss for a single training example or not? If so how do they make use of vectorization for example, the case of a convolutional neural network.