Published on

Generating unique IDs with the Snowflake algorithm

Generating unique IDs with the Snowflake algorithm Post Banner
Table of Contents

Unique IDs are pretty much used everywhere and in every application. They are used to identify a user, a product, a transaction, a session, etc. That way we can, for example, assign a transaction made to a specific user, and not by mistake another one. In this article, I'll show you how I generate unique IDs in my applications and give them a second usage as well.

How to get unique IDs?

There are several ways of getting a unique ID per user. Obviously some people probably do stuff like

const min = 1000000;
const max = 9999999999;
 
const id = Math.floor(Math.random() * max) + min;

Though I don't really recommend that as it is obviously not really unique. You can, and will, end up with twice the same ID generated. Of course you can check in the database if that ID already exists and generate a new one if it's the case, but you get my point

Auto incrementing integer

Although this is a very basic and widely used method, I'd rather not use this. An example on how this looks like in a database would be something close to that:

idusernamepasswordcreated_at
1johnsome_hash1667329399
2janesome_hash1667329404
3jacksome_hash1667329412

Here we can clearly see the IDs incrementing every time one by one.

UUID

UUIDs are a very common way of getting a unique ID. They are 128-bit numbers that are unique. UUIDs are very useful because they are unique, yes there is a tiny chance you can get the same ID generated twice, and they are not sequential. They are also very easy to generate. We can, for example, generate one via the terminal using the following command:

~ 🕙 20:07:00 uuidgen 67DB31FB-1887-4372-A8B0-E87C092D7D11

As you have expected, this UUID will take space in your database but that doesn't really matter that much, your database ends up looking like that when you use UUIDs:

idusernamepasswordcreated_at
45915234-74BE-499A-B7D5-D6D4882DA5B7johnsome_hash1667329399
885A26B8-C7CA-4C84-A871-0E89AF4997BDjanesome_hash1667329404
689E682B-35C8-4594-A0D7-5893626B7B1Djacksome_hash1667329412

Now let me introduce to you Spaceflake, a distributed generator to create unique IDs with ease; inspired by Twitter's Snowflake

What is a Spaceflake?

A Spaceflake is very simple, it's an ID based on the Snowflake algorithm of Twitter, which they are actively using, and it is slightly edited to fit my needs. The algorithm is pretty simple:

The structure of a Spaceflake
The structure of a Spaceflake

As you can see, a Spaceflake is composed of 4 main parts:

  • Timestamp: The timestamp is the milliseconds elapsed since the base epoch, it is not necessarily the Unix epoch, when the ID was generated.
  • Node ID: The node ID represents the ID of the node that generated the Spaceflake. It is a number between 0 and 31.
  • Worker ID: The worker ID represents the ID of the worker that generated the Spaceflake. It is a number between 0 and 31.
  • Sequence: The sequence is the number of Spaceflakes generated by the worker on the node. It is a number between 0 and 4095.

With these specifications of the algorithm, we can theoretically generate up to 4 million unique IDs per millisecond

Workers & Nodes = Spaceflake Network

A Spaceflake Network is a very basic concept where you have multiple independent nodes that themselves consist of multiple workers. These workers are the ones that can generate Spaceflakes.

Ideally a Spaceflake Network represents your entire application, or company. Each node represents a single service/server or application within the company, and each worker represents a single process which can generate a Spaceflake for a specific purpose. This way you can easily identify where a Spaceflake was generated by looking at its node ID and worker ID.

A Spaceflake Network
A Spaceflake Network

In the example above we can can consider Node 1 as being the API/backend of your application. The Worker (ID: 1) would be responsible for generating Spaceflakes for user IDs. The Worker (ID: 2) would be responsible for generating Spaceflakes for blog post IDs. That way we can easily identify where a Spaceflake was generated and for what purpose. Then we also have the Node 2 which may be the logging system of your entire infrastructure, and the IDs generated would be generated by the Worker (ID: 1).

In the end you are free to use them as you wish, just make sure you use these nodes and workers to be able to identify the Spaceflake - which is also the idea behind it.

Why did I create Spaceflakes?

Now you may ask me

"Why on Earth did you create this???"

As a matter of facts lots of people have already asked me this...

To explain my decision, let's go back to the example of the database above with UUIDs. I intentionally added a created_at field in the database as it is very useful to know when a user, or a post, was created. With Spaceflakes, and Snowflakes, you have the creation time of the ID built-in ((spaceflake >> 22) + BASE_EPOCH), so you don't need to store it separately in a new column of your database. This is very useful as it removes unnecessary columns, and it also makes it easier to sort your data by creation time. Your database would then look like that:

idusernamepasswordcreated_at
1037425364332843009johnsome_hash1667329399
1037425458566270978janesome_hash1667329404
1037425546889924611jacksome_hash1667329412

Another reason is that I use Go (duh) for all my different backends, and I wanted to implement my own way of generating unique IDs that are just like Snowflakes. The goal was to make it as easy as possible to use, and to be able to use it in any project I am working on.

Some issues I've faced during development

Go is fast

Gotta GO fast (ba dum tss)
Gotta GO fast (ba dum tss)

Yes, Go is fast; this was also my first issue. The reason behind that is that at first I made the sequence random, which was definitely not the best idea - though I've kept it in case I want to use it one day. The issue is that, as you've guessed, the sequence is from 0 to 4095, which means that it can only generate 4096 unique IDs per millisecond. This is not a lot, and it is definitely not enough to make sure to have a unique ID for each Spaceflake generated within a millisecond. So I had to change it to be incremental.

Incremental is not enough for bulk generation

So let's suppose someone wants to migrate their database to use Spaceflakes instead of UUIDs. They would have to generate a lot of Spaceflakes, and they would have to do it as fast as possible. This is where the issue comes in. If you generate a Spaceflake every millisecond, you can only generate 4096 Spaceflakes per millisecond. This is, once again, not a lot. So I had to make the node ID and the worker ID incremental as well when using the BulkGenerate() function. The code is a bit madness, but it's working (at least I think so).

// BulkGenerate will generate the amount of Spaceflakes specified and auto scale the node and worker IDs
func BulkGenerate(s BulkGeneratorSettings) ([]*Spaceflake, error) {
    node := NewNode(1)
    worker := node.NewWorker()
    worker.BaseEpoch = s.BaseEpoch
    spaceflakes := make([]*Spaceflake, s.Amount)
    for i := 1; i <= s.Amount; i++ {
        if i%(MAX12BITS*MAX5BITS*MAX5BITS) == 0 { // When the total amount of Spaceflakes generated is the maximum per millisecond, such as 3'935'295
            time.Sleep(1 * time.Millisecond) // We need to sleep to make sure the timestamp is different
            newNode := NewNode(1) // We set the node ID to 1
            newWorker := newNode.NewWorker() // And we create a new worker
            newWorker.BaseEpoch = s.BaseEpoch
            node = newNode
            worker = newWorker
        } else if len(node.workers)%MAX5BITS == 0 && i%(MAX5BITS*MAX12BITS) == 0 { // When the node ID is at its maximum value
            newNode := NewNode(node.ID + 1) // We need to create a new node
            newWorker := newNode.NewWorker() // And a new worker
            newWorker.BaseEpoch = s.BaseEpoch
            node = newNode
            worker = newWorker
        } else if i%MAX12BITS == 0 { // When the worker ID is at its maximum value
            newWorker := worker.Node.NewWorker() // We just create a new worker
            newWorker.BaseEpoch = s.BaseEpoch
            worker = newWorker
        }
        spaceflake, err := generateSpaceflakeOnNodeAndWorker(worker, worker.Node) // We generate the Spaceflake on the node and worker specified
        if err != nil {
            return nil, err
        }
        spaceflakes[i-1] = spaceflake // i starts at 1, so we need to subtract 1 to get the correct index
    }
    return spaceflakes, nil
}

Now we can make a bulk generation of up to 3'935'295 Spaceflakes per millisecond, which is a lot more than 4096 without auto scaling. Once we hit that maximal amount, we need to sleep for 1 millisecond to make sure the timestamp is different, and we need to create a new node and a new worker.

That is another problem solved

Spaceflakes are big numbers

Last but not least, I learned that big numbers are pain if not handled as string within an API. This was exactly what happened to me when using Spaceflakes for one of my API; I generated a Spaceflake - 144328692659220481 - and made an endpoint to get the user ID. When making a GET request to the endpoint with a browser, I got the following json:

{
  "id": 144328692659220480
}

Yes yes, there is a small difference - it's right here: 144328692659220480. Now this may be a small difference, but it is a difference and we do not want that. The only solution is to make the API return the ID as a string. On the client side, we can then convert it to a number if we want to - for example by using this:

spaceflakeID := "144328692659220481" // Will be the value returned by the API
id, _ := strconv.ParseUint(spaceflakeID, 10, 64)
sequence := spaceflake.ParseSequence(id)

This will then return the correct sequence of the Spaceflake.

How to use Spaceflakes?

Using Spaceflakes is super simple, if you have a Go project, you can just download the package with

go get github.com/kkrypt0nn/spaceflake

And then you can use it in your project by importing it and generate a Spaceflake:

package main
 
import (
    "fmt"
 
    "github.com/kkrypt0nn/spaceflake"
)
 
func main() {
    node := spaceflake.NewNode(5)
    worker := node.NewWorker() // If BaseEpoch not changed, it will use the EPOCH constant
    sf, err := worker.GenerateSpaceflake()
    if err != nil {
        panic(err)
    }
    fmt.Println(sf.Decompose()) // map[id:<Spaceflake> nodeID:5 sequence:1 time:<timestamp> workerID:1]
}

There are other examples you can check out in the examples folder of the repository.