I just read about the IEEE 754 standard in order to understand how single-precision and double-precision floating points are implemented. So I wrote this to check my understanding: #include <stdio.h> #include <float.h> int main() { double foo = 9007199254740992; // 2^53 double bar = 9007199254740993; // 2^53 + 1 printf("%d\n\n",...

I have four double precision real numbers (n1, n2, n3, & n4) in an array (n). The weird thing that when I calculate the sum of these four numbers within a DO loop and then calculate the sum directly I don't get the same exact number! Note that I am...

I'm building a program to to convert double values in to scientific value format(mantissa, exponent). Then I noticed the below 369.7900000000000 -> 3.6978999999999997428 68600000 -> 6.8599999999999994316 I noticed the same pattern for several other values also. The maximum fractional error is 0.000 000 000 000 001 = 1*e-15 I know...

From the C++11 header , I was wondering if a std::uniform_real_distribution<double> object can spit out a double that's greater than 0.99999999999999994? If so, multiplying this value by 2 would equal 2. Example: std::default_random_engine engine; std::uniform_real_distribution<double> dist(0,1); double num = dist(engine); if (num > 0.99999999999999994) num = 0.99999999999999994; int test1 =...

I am trying to compile a program written in FORTRAN that plots graphs using the DISLIN libraries, but all data is in double precision. I cannot lose this precision, so passing everything to simple precision is not an option. When I attempt to link to the double precision libraries (_d),...

I have trouble with the following code. It is not giving the same answer in excel as C# and I'm sure the excel answer is correct. I have already tried to change integer values to double by adding a decimal point. Please advise. the excel version =((1+BZ21)*BX21*CG21*(B21*1))-(0.5*BZ21*BX21*CG21*((C21*0)+(0*D21))) the c# version...

I have been in the process of writing a FORTRAN code for numerical simulations of an applied physics problem for more than two years and I've tried to follow the conventions described in Fortran Best Practices. More specifically, I defined a parameter as integer, parameter:: dp=kind(0.d0) and then used it...

I have noticed a small error on some arithmetic calculations using double. It is really weird, there's always a small error and/or an extra significant digit. First I am using atof to convert a number that has two significant digits that I am reading from a text file (then I...

Math.Pow seems to be not working correctly for big results. Probably that is because it uses double for calculations (How is Math.Pow() implemented in .NET Framework?). For example: public static void Main() { Console.WriteLine((long)Math.Pow(17, 13)); Console.WriteLine(Pow(17, 13)); } public static long Pow(int num, int pow) { long answer = 1;...

I was solving this problem on spoj http://www.spoj.com/problems/ATOMS/. I had to give the integral part of log(m / n) / log(k) as output. I had taken m, n, k as long long. When I was calculating it using long doubles, I was getting a wrong answer, but when I used...

I need an acos() function with double precision within a compute shader. Since there is no built-in function of acos() in GLSL with double precision, I tried to implement my own. At first, I implemented a Taylor series like the equation from Wiki - Taylor series with precalculated faculty values....

I'm writing a piece of code to convert double values to scientific notations upto a precision of 15 in C++. I know I can use standard libraries like sprintf with %e option to do this. But I would need to come out with my own solution. I'm trying something like...

I'm going to boil this problem down to the simplest form: Let's iterate from [0 .. 5.0] with a step of 0.05 and print out 'X' for every 0.25 multiplier. for(double d=0.0; d<=5.0; d+=0.05) { if(fmod(d,0.25) is equal 0) print 'X'; } This will of course not work since d...

I have to implements a program that calculate the machine epsilon for float and double. I wrote these functions: int feps(){ //machine epsilon for float float tmp=1; int d=0; while(1+(tmp=tmp/2)>1.0f)d++; return d; } int deps(){ //machine epsilon for double double tmp=1; int d=0; while(1+(tmp=tmp/2)>1.0)d++; return d; } Note: 64 bit...