-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
uint64_t
encode_ten_thousands(uint64_t hi, uint64_t lo)
{
uint64_t merged = hi | (lo << 32);
/* Truncate division by 100: 10486 / 2**20 ~= 1/100. */
uint64_t top = ((merged * 10486ULL) >> 20) & ((0x7FULL << 32) | 0x7FULL);
/* Trailing 2 digits in the 1e4 chunks. */
uint64_t bot = merged - 100ULL * top;
uint64_t hundreds;
uint64_t tens;
/*
* We now have 4 radix-100 digits in little-endian order, each
* in its own 16 bit area.
*/
hundreds = (bot << 16) + top;
/* Divide and mod by 10 all 4 radix-100 digits in parallel. */
tens = (hundreds * 103ULL) >> 10;
tens &= (0xFULL << 48) | (0xFULL << 32) | (0xFULL << 16) | 0xFULL;
//tens += (hundreds - 10ULL * tens) << 8;
tens = (hundreds << 8) - 2559ull * tens;
return tens;
}Explanation:
decimal: a*10+b=x
BCD: a+256*b=a+256*(x-a*10)=a+256*x-2560*x=256*x-2559*a=256*x-2559*(x/10)
I haven't conducted performance tests, but the above implementation has fewer instructions and fewer instruction dependencies, so there should be some performance improvement.
Metadata
Metadata
Assignees
Labels
No labels