The complete list would be much longer, but we can limit it to these techniques that are the most common.
I-vector is based on Hidden Markov Chains on Gaussian Mixture Models: two statistical models to estimate speaker change and determine speaker vectors based on a set of known speakers. It is a legacy technique that can still be used.
X-vector and d-vectors systems are based on neural networks trained to recognise a set of speakers. These systems are better in terms of performance, but require more training data and setup. Their features are used as speaker vectors.
ClusterGAN takes this a step further and tries to transform an existing speaker vector into another one that contains better information by using 3 neural networks competing against each other.
When this step is done, we end up with speaker vectors for each segment.