雪花算法

简介:

雪花算法是Twitter开源的分布式ID生成算法 Github仓库地址
雪花算法主要用于分布式系统中,数据库的ID生成
在自然界中并不存在两片完全一样的雪花,每一片雪花都拥有自己漂亮独特的形状,独一无二.雪花算法也表示生成的分布式id如雪花般独一无二.

分布式ID

随着业务被使用的人越来越多, 单机的数据库已经很难保证业务能够流畅稳定的运行了, 这是我们需要对数据库进行分库分表存储, 使用分布式集群, 但是这样每个表的数据怎么保证ID唯一呢? 如果使用主键递增肯定发生ID不唯一的冲突情况, 所以急需一种可以生成全局唯一ID的算法来解决这个囧境.

分布式ID需要满足的要求

全局唯一:

这是最基本的要求

高性能:

不能为了全局唯一就去生成一大长串,肯定需要考虑性能,既要考虑生成的效率,又要考虑查询的效率(即存储的效率).

高可用:

生成分布式ID的服务要保证可用性高,无限接近100%

方便易用:

拿来即用，使用方便，快速接入！

安全:

分布式ID中不应含有敏感信息,否则了解算法的不怀好意之人解码可能会获取到这些敏感信息.

有序递增:

如果要把 ID 存放在数据库的话，ID 的有序性可以提升数据库写入速度。并且，很多时候，我们还很有可能会直接通过 ID 来进行排序。

要求具体的业务含义:

生成的ID如果能有具体的业务含义,可以让定位问题以及开发更透明化(例如根据ID就能确定是哪个业务)

独立部署:

也就是分布式系统单独有一个发号器服务，专门用来生成分布式 ID。这样就生成 ID 的服务可以和业务相关的服务解耦。不过，这样同样带来了网络调用消耗增加的问题。总的来说，如果需要用到分布式 ID 的场景比较多的话，独立部署的发号器服务还是很有必要的。(企业级的大型项目中十分有必要)

组成

雪花算法生成的ID是一个64bit的long型的数字且按时间趋势递增.大致有首位符号位(无效位), 时间戳插值, 机器编码, 序列号四部分组成.

如图:

首位无效符: 主要用做为符号位,因为一般都是生成正数,所以符号位统一都是0
**时间戳:**占用41bit,精确到毫秒. 41bit位最好可以表示2^41-1毫秒, 转化成单位年为69年.
机器编码: 占用10bit,其中高位5bit是数据中心ID,低位5bit是工作节点ID,最多可以容纳1024个节点.
**序列号:**占用12bit,每个节点每毫秒0开始不断累加,最多可以累加到2^12-1,一共可以生成4096个ID(包括了0)

代码实现

Java代码实现

snowFlake

package com.dyw.snowFlake;

/**
 * @author Devil
 * @create 2022-04-04 14:25
 */
@SuppressWarnings("all")
public class IDWorker {
    //十位的工作机器码
    private long workerId; //工作id 五位
    private long datacenterId; //数据中心id 五位

    //12位序列号
    private long sequence = 0L;

    //初始时间戳
    private final long twEpoch = 1288834974657L;

    //长度为5位
    private final long workerIdBits = 5L;
    private final long datacenterIdBits = 5L;
    //最大值
    private final long maxWorkerId = ~(-1L << workerIdBits);
    private final long maxDatacenterId = ~(-1L << datacenterIdBits);

    //序列号id长度
    private final long sequenceBits = 12L;
    private final long sequenceMask = ~(-1L << sequenceBits);

    //工作id需要左移的位数, 12位(序列号的位长)
    private final long workerIdShift = sequenceBits;
    //数据中心id需要左移的位数 序列号长+工作id长
    private final long datacenterIdShift = sequenceBits + workerIdBits;
    //时间戳左移位数 = 序列号长+工作id长+工作位长
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

    //上次时间戳, 初始值位负值
    private long lastTimestamp = -1L;

    /**
     * 构造方法
     * @param workerId 工作节点id
     * @param datacenterId 数据中心id
     */
    public IDWorker(long workerId, long datacenterId) {
        //检查参数的合法性
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        System.out.printf("worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    public long getWorkerId() {
        return workerId;
    }

    public long getDatacenterId() {
        return datacenterId;
    }

    public long getSequence() {
        return sequence;
    }

    /**
     * //下一个ID生成算法
     * @return snowflakeId
     */
    public synchronized long nextId() {
        //先获取当前系统时间
        long timestamp = timeGen();
        //如果当前系统时间比上次获取id时间戳小就抛出异常 时钟往后移动可能会出现同样id所以这里必须抛异常结束执行
        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.",lastTimestamp);
            throw new RuntimeException(String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds",
                    lastTimestamp - timestamp));
        }

        //获取当前时间戳如果等于上次时间戳(同一毫秒内),则在序列号加一,否则序列号赋值为0, 从零开始
        if(timestamp==lastTimestamp){
            //这是使用&sequenceMask是为了防止sequence溢出12位(前面要求了sequence的长度只能是12位)
            sequence = (sequence+1)&sequenceMask;
            //如果防止刚好移除经过&sequenceMask后 会变成0 可能会发生重复的情况
            //所以此时需要再次获取时间戳,并于上次时间戳作比较 直到与上次时间戳不一致返回当前时间戳避免重复
            if(sequence==0){
                timestamp = tilNextMillis(lastTimestamp);
            }
        }//如果不在同一个时间戳中 代表该序列刚开始计数所以初始为0
        else{
            sequence = 0;
        }

        //将上次时间戳值更新
        lastTimestamp = timestamp;


        /*
         * 返回结果：
         * (timestamp - TwEpoch) << timestampLeftShift) 表示将时间戳减去初始时间戳，再左移相应位数
         * (datacenterId << datacenterIdShift) 表示将数据id左移相应位数
         * (workerId << workerIdShift) 表示将工作id左移相应位数
         * | 是按位或运算符，例如：x | y，只有当x，y都为0的时候结果才为0，其它情况结果都为1。
         * 因为个部分只有相应位上的值有意义，其它位上都是0，所以将各部分的值进行 | 运算就能得到最终拼接好的id
         */
        return ((timestamp - twEpoch)<<timestampLeftShift) |
                (datacenterId<<datacenterIdShift) |
                (workerId<<workerIdShift)|
                sequence;



    }

    /**
     * 获取时间戳,并于上次时间戳作比较
     * @param lastTimestamp 上一次获取的时间戳
     * @return timestamp 更新后的系统时间
     */
    private long tilNextMillis(long lastTimestamp){
        long timestamp = timeGen();
        while(timestamp<=lastTimestamp){
            timestamp = timeGen();
        }
        return timestamp;
    }

    /**
     * 获取系统时间戳
     * @return 系统时间戳
     */
    private long timeGen() {
        return System.currentTimeMillis();
    }


}

Go语言实现

package al

import (
	"errors"
	"fmt"
	"sync"
	"time"
)

var mutex = sync.Mutex{}

const (
	//初始时间戳
	twEpoch int64 = 1288834974657
	//长度为5位
	workerIdBits     int64 = 5
	datacenterIdBits int64 = 5

	//最大值
	maxWorkerId     int64 = -1 ^ (-1 << workerIdBits)
	maxDatacenterId int64 = -1 ^ (-1 << datacenterIdBits)

	//序列号id长度
	sequenceBits int64 = 12
	sequenceMask       = -1 ^ (-1 << sequenceBits)

	//工作id需要左移的位数, 12位(序列号的位长)
	workerIdShift = sequenceBits

	//数据中心id需要左移的位数 序列号长+工作id长
	datacenterIdShift = sequenceBits + workerIdBits

	//时间戳左移位数 = 序列号长+工作id长+工作位长
	timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits
)

//上次时间戳, 初始值位负值
var lastTimestamp int64 = -1

type IDWorker struct {
	//十位的工作机器码
	workerId     int64 //工作id 五位
	datacenterId int64 //数据中心id 五位
	//12位序列号
	sequence int64
}

func InitIDWorker(workerId, datacenterId int64) (*IDWorker, error) {
	//检查参数合法性
	if workerId > maxWorkerId || workerId < 0 {
		var err = errors.New(fmt.Sprintf("worker Id can't be greater than %d or less than 0", maxWorkerId))
		return nil, err
	}
	if datacenterId > maxDatacenterId || datacenterId < 0 {
		var err = errors.New(fmt.Sprintf("datacenter Id can't be greater than %d or less than 0", maxDatacenterId))
		return nil, err
	}
	fmt.Printf("worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
		timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId)
	return &IDWorker{
		datacenterId: datacenterId,
		workerId:     workerId,
	}, nil
}

/*
	下一个ID生成算法
*/
func (i *IDWorker) NextId() (id int64, err error) {
	//上锁
	mutex.Lock()
	//程序结束 释放锁
	defer mutex.Unlock()
	//先获取当前系统时间
	var timestamp = timeGen()
	//如果当前系统时间比上次获取id时间戳小就抛出异常 时钟往后移动可能会出现同样id所以这里必须抛异常结束执行
	if timestamp < lastTimestamp {
		fmt.Printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp)
		err = errors.New(fmt.Sprintf("Clock moved backwards.  Refusing to generate id for %d milliseconds",
			lastTimestamp-timestamp))
	}
	//获取当前时间戳如果等于上次时间戳(同一毫秒内),则在序列号加一,否则序列号赋值为0, 从零开始
	if timestamp == lastTimestamp {
		//这是使用&sequenceMask是为了防止sequence溢出12位(前面要求了sequence的长度只能是12位)
		i.sequence = (i.sequence + 1) & sequenceMask
		//如果防止刚好移除经过&sequenceMask后 会变成0 可能会发生重复的情况
		//所以此时需要再次获取时间戳,并于上次时间戳作比较 直到与上次时间戳不一致返回当前时间戳避免重复
		if i.sequence == 0 {
			timestamp = tilNextMillis(lastTimestamp)
		}
	} else {
		//如果不在同一个时间戳中 代表该序列刚开始计数所以初始为0
		i.sequence = 0
	}
	//将上次时间戳值更新
	lastTimestamp = timestamp

	//返回雪花算法生成的id
	id = ((timestamp - twEpoch) << timestampLeftShift) |
		(i.datacenterId << datacenterIdShift) |
		(i.workerId << workerIdShift) |
		i.sequence
	return
}

func (i IDWorker) WorkerId() int64 {
	return i.workerId
}
func (i IDWorker) DatacenterId() int64 {
	return i.datacenterId
}

func (i IDWorker) Sequence() int64 {
	return i.sequence
}

/*
	获取系统时间戳
*/
func timeGen() int64 {
	return time.Now().UnixMilli()
}

/*
	获取时间戳,并于上次时间戳作比较
*/
func tilNextMillis(lastTimestamp int64) int64 {
	var timestamp = timeGen()
	for timestamp <= lastTimestamp {
		timestamp = timeGen()
	}
	return timestamp
}