How is Stochastic Gradient Descent (SGD) modified to be differentially private?

SGD works by stochastically sampling a set of training examples, computing the loss (difference between predicted value and real value), computing the gradient of the loss, then after modifying these gradients by the learning rate, uses the resulting values to update the model parameters. The iteration of this process is what’s meant by descent. There are few main changes to this process to make it differentially private. First the gradients are clipped such that no single training example can unduly impact the model, and second, random noise is added to the clipped gradients to make it impossible to deduce which examples were included in the training. Additionally, instead of clipping gradients at a batch level, they are clipped in micro-batches. The more clipping, noise adding and micro-batching you have, the more differentially private your model will be. As there is often a trade-off between privacy and utility, Gretel-synthetics exposes each of these elements as modifiable parameters in the training.