Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

I find you lack of faith in the forth dithturbing. -- Darse ("Darth") Vader


devel / comp.lang.asm.x86 / Performance of denormal numbers

SubjectAuthor
o Performance of denormal numbersBonita Montero

1
Performance of denormal numbers

<tg4luh$69gq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=808&group=comp.lang.asm.x86#808

  copy link   Newsgroups: comp.lang.asm.x86 comp.lang.c++
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: Bonita.Montero@nospicedham.gmail.com (Bonita Montero)
Newsgroups: comp.lang.asm.x86,comp.lang.c++
Subject: Performance of denormal numbers
Date: Sat, 17 Sep 2022 16:35:18 +0200
Organization: A noiseless patient Spider
Lines: 74
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <tg4luh$69gq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: reader01.eternal-september.org; posting-host="2f117e31ee01762fd86da727ef25db19";
logging-data="207361"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/KX06mm4kg3gdP8BxN/PIMbg1F2+4wsQs="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:acCk7nA3uYhPzma+mSnOfAF/Ecw=
 by: Bonita Montero - Sat, 17 Sep 2022 14:35 UTC

I wanted to check if denormal numbers have slower performance on
modern CPUs. Intel introduced the DAZ / FTZ Bits with SSE1 because
denormals were even handled in microcode:

#include <iostream>
#include <bit>
#include <cstdint>
#include <chrono>
#include <utility>
#include <atomic>

using namespace std;
using namespace chrono;

uint64_t denScale( uint64_t rounds, bool den );

int main()
{ auto bench = []( bool den ) -> double
{
constexpr uint64_t ROUNDS = 25'000'000;
auto start = high_resolution_clock::now();
int64_t nScale = denScale( ROUNDS, den );
return (double)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / nScale;
};
double
tDen = bench( true ),
tNorm = bench( false ),
rel = tDen / tNorm - 1;
cout << tDen << endl;
cout << tNorm << endl;
cout << trunc( 100 * 10 * rel + 0.5 ) / 10 << "%" << endl;
}

MASM code:

PUBLIC ?denScale@@YA_K_K_N@Z

CONST SEGMENT
DEN DQ 00008000000000000h
ONE DQ 03FF0000000000000h
P5 DQ 03fe0000000000000h
CONST ENDS

_TEXT SEGMENT
?denScale@@YA_K_K_N@Z PROC
xor rax, rax
test rcx, rcx
jz byeBye
mov r8, ONE
mov r9, DEN
test dl, dl
cmovnz r8, r9
movq xmm1, P5
mov rax, rcx
loopThis:
movq xmm0, r8
REPT 52
mulsd xmm0, xmm1
ENDM
sub rcx, 1
jae loopThis
mov rdx, 52
mul rdx
byeBye:
ret
?denScale@@YA_K_K_N@Z ENDP
_TEXT ENDS
END

For my PC normal numbers have a 25% higher throughput.
Feel free to post your results also.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor