Compare commits
3 Commits
5bc55414a3
...
3b126e7fa6
| Author | SHA1 | Date | |
|---|---|---|---|
| 3b126e7fa6 | |||
| 40ba11f33c | |||
| f0468502d8 |
9
README.md
Normal file
9
README.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# F8: 8-bit floats
|
||||
|
||||
> ⚠️ Not for production use!
|
||||
|
||||
F8 is a toy software floating-point math library.
|
||||
It provides an 8-bit floating point type `F8`, with 5 mantissa bits, 3 exponent bits, and no sign bit.
|
||||
The format used resembles [IEEE 754] binary formats but stripped down to the bare necessities: the only special value supported is zero.
|
||||
|
||||
[IEEE 754]: https://en.wikipedia.org/wiki/IEEE_754
|
||||
19
src/lib.rs
19
src/lib.rs
|
|
@ -31,7 +31,22 @@ const E_BIAS: u8 = E_STORAGE_MAX - E_MAX;
|
|||
const M_MASK: u8 = M_STORAGE_MAX;
|
||||
const E_MASK: u8 = E_STORAGE_MAX << M_BITS;
|
||||
|
||||
/// 8-bit unsigned binary floating-point type, with [`M_BITS`] mantissa bits and [`E_BITS`] exponent bits.
|
||||
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
|
||||
/// 8-bit unsigned binary floating-point type.
|
||||
///
|
||||
/// # Properties
|
||||
///
|
||||
/// * Mantissa width: 5 bits ([`M_BITS`])
|
||||
/// * Exponent width: 3 bits ([`E_BITS`])
|
||||
/// * Negative values: not supported
|
||||
/// * Zero: special-cased
|
||||
/// * Subnormals: not supported
|
||||
/// * Infinity: not supported
|
||||
/// * NaN: not supported
|
||||
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Default)]
|
||||
#[repr(transparent)]
|
||||
pub struct F8(u8);
|
||||
|
||||
impl F8 {
|
||||
pub const ZERO: Self = Self(0);
|
||||
pub const ONE: Self = Self::merge(0, E_BIAS);
|
||||
}
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user