add optimized SHA1 algorithm

This optimized implementation of the SHA1 algorithm is about 28%
faster than the old one (on sapphire hardware) but assumes
little-endianness.  Add it, but continue using the old implementation
on big-endian hardware.
2 files changed