一个数二进制中1的个数(Hamming weight)

学习一下 https://en.wikipedia.org/wiki/Hamming_weight：

计算一个数二进制非0的个数，直接的思路就是把所有位1相加即为1的个数
以最简单的情况举例, 一个2位二进制数, 比如11(或10或01), 只要把它的第二位和第一位相加,
即 (n & 1) + (n >> 1) [此算法可以视为除余法, 右移即除2余1相加]
这样如果有复杂点的数, 都可以分割成每相邻的两位进行以上的计算, 即把偶数位1加奇数位1
例如有一个数的二进制 abcd (abcd是0或1),1个数为a+b+c+d 按每相邻两位计算为 [a+b]+[c+d],
即我们要把相邻中的偶数位和奇数位分离出来所以需要与上一个 0101
该数的奇数位 0b0d = abcd & 0101
该数的偶数位 0a0c = (abcd >> 1) & 0101
这样 0b0d + 0a0c 结果就是每相邻2位一组, 每一组保存的都是最初在这2位的1的个数(假设结果为 efgh, ef为a+b gh为c+d)
取低位两位的1的个数 gh= efgh && 0011,
取高位两位的1的个数 ef = (efgh >> 2) & 0011
把 ef + gh 即为这个四位数所有1的个数,
同理, 一个更多位的数可以继续计算相关的4位相加即 hijklmn && 00001111, (hijklmn >> 4) & 00001111

这里运用了分治的思想
先计算相邻的2位中1的个数
再计算相邻的4位中1的个数, 接下来算8位, 16位, 32位
因为 32 = 2^5, 所以对于32位数, 5条位运算语句即可

//types and constants used in the functions below
//uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language)
const uint64_t m1  = 0x5555555555555555; //binary: 0101...
const uint64_t m2  = 0x3333333333333333; //binary: 00110011..
const uint64_t m4  = 0x0f0f0f0f0f0f0f0f; //binary:  4 zeros,  4 ones ...
const uint64_t m8  = 0x00ff00ff00ff00ff; //binary:  8 zeros,  8 ones ...
const uint64_t m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ...
const uint64_t m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones
const uint64_t hff = 0xffffffffffffffff; //binary: all ones
const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3...

//This is a naive implementation, shown for comparison,
//and to help in understanding the better functions.
//This algorithm uses 24 arithmetic operations (shift, add, and).
int popcount64a(uint64_t x)
{
    x = (x & m1 ) + ((x >>  1) & m1 ); //put count of each  2 bits into those  2 bits 
    x = (x & m2 ) + ((x >>  2) & m2 ); //put count of each  4 bits into those  4 bits 
    x = (x & m4 ) + ((x >>  4) & m4 ); //put count of each  8 bits into those  8 bits 
    x = (x & m8 ) + ((x >>  8) & m8 ); //put count of each 16 bits into those 16 bits 
    x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits 
    x = (x & m32) + ((x >> 32) & m32); //put count of each 64 bits into those 64 bits 
    return x;
}

算法优化
ab – 0a 得到的值为ab中1的个数
简单证明: 若 a 为 0, 那么 0a = 0, 0b – 0 无变化, 那么b就是结果
若 a 位 1, 10 – 01 = 01, 11 – 01 = 10, 都符合 ab – 0a 得到的值为ab中1的个数
这样 n -= (n >> 1) & m1和 x = (x & m1 ) + ((x >> 1) & m1 )的结果相同, 却节省了1个操作

//This uses fewer arithmetic operations than any other known  
//implementation on machines with slow multiplication.
//This algorithm uses 17 arithmetic operations.
int popcount64b(uint64_t x)
{
    x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits 
    x += x >>  8;  //put count of each 16 bits into their lowest 8 bits
    x += x >> 16;  //put count of each 32 bits into their lowest 8 bits
    x += x >> 32;  //put count of each 64 bits into their lowest 8 bits
    return x & 0x7f;
}

//This uses fewer arithmetic operations than any other known  
//implementation on machines with fast multiplication.
//This algorithm uses 12 arithmetic operations, one of which is a multiply.
int popcount64c(uint64_t x)
{
    x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits 
    return (x * h01) >> 56;  //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ... 
}