I have implemented a script that does constrained optimization for solving the optimal parameters of Support Vector Machines model. I noticed that my script for some reason gives inaccurate results (although very close to the real value). For example the typical situation is that the result of a calculation should...

I don't know if it is possible to do this, but i need to split a floating point number in sum of two number... For example assuming x is a floating point number and we want to split this in x = I + f, where I is the signed...

Is it possible to get division by 0 (or infinity) in the following example? public double calculation(double a, double b) { if (a == b) { return 0; } else { return 2 / (a - b); } } In normal cases it will not, of course. But what if...

I have a series I get from an outside source (x). It's all positive, and is mostly zero. x.describe() count 23275.000000 mean 0.015597 std 0.411720 min 0.000000 25% 0.000000 50% 0.000000 75% 0.000000 max 26.000000 dtype: float64 However, running rolling_sum on it produces values smaller than zero. Why does it...

I have to do calculation in my application, i have formules like this one : result = Capital * rate / (1- 1/(1+ rate)^frequence) I have read in internet that doing calculation with floats can be lossless. Should i use NSDecimalNumber in my situation ?...

I have a number in the decimal system. Type double. I translate it using the fractional part of the cycle the odds, it looks like this: double part; part = part - int(part); for (auto i = 0; i < ACCURACY; i++) //Точность { part *= typeEncode; result += std::to_string(int(part));...

For my deterministic physics engine, I need to confirm that calculations with doubles in C# are consistent enough across multiple platforms. Does anyone know how much the following functions differ in results? On my computer as a Windows 32 bit application, these are the results (Note: Pseudo-code): double x =...

Suppose I have equally spaced doubles (64 bit floating point numbers) x0,x1,...,xn. Equally spaced means that for all i, x(i+1) - xi is constant; call it w for width. Given a number y in the range [x0,xn] I want to find the largest i such that xi <= y. A...

When I do a floating point addition I get different results. My database is 32 bit Kognitio. Can some one explain me better why this is a problem when I have my floating point values well within the limits. I do understand that the operations involving floating point numbers are...

I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory. Example: A 640 bit number has 640 digits in base 2, but only ten digits in base...

I'm trying to calculate the true course from one point to anoter on the surface of the earth in as few CPU cycles as possible. The result should be a double 0 <= tc < 360, however in a few special cases i get the result 360 (should be reported...

In the following code, why is there a comparison against float.Epsilon and not 0? // Coroutine to move elements protected IEnumerator SmoothMovement (Vector3 end) { // Distance computation float sqrRemainingDistance = (transform.position - end).sqrMagnitude; while(sqrRemainingDistance > float.Epsilon) { Vector3 newPostion = Vector3.MoveTowards( rb2D.position, end, inverseMoveTime * Time.deltaTime ); rb2D.MovePosition (newPostion);...

I want to create a vector containing dates in matlab. For that I specified the start time and the stop time: WHM01_start = datenum('01-JAN-2005 00:00') WHM01_stop = datenum('01-SEP-2014 00:00') and then I created the vector with WHM01_timevec = WHM01_start:datenum('01-JAN-2014 00:20') - datenum('01-JAN-2014 00:00'):WHM01_stop; after I want to have time steps...

I have noticed a small error on some arithmetic calculations using double. It is really weird, there's always a small error and/or an extra significant digit. First I am using atof to convert a number that has two significant digits that I am reading from a text file (then I...

I'm using Python's ctypes library to call my C code. My problem is that when I try to create a c_float, I seem to obtain a slightly different value to what I set. For example print(value) print(c_float(value)) 0.2 c_float(0.20000...298...) How can I avoid this?...

For example, The code below will give undesirable result due to precision of floating point numbers. double a = 1 / 3.0; int b = a * 3; // b will be 0 here I wonder whether similar problems will show up if I use mathematical functions. For example int...

I am looking to calculate 9^19. my code is: cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); cout<<pow(9,19)<<endl; The result has the last 2 digits equal to 0: 1350851717672992000. In Python,9**19 got me 1350851717672992089L . Seems a floating point issue. How could I raise the precision for pow? or how to preform a better precision...

What is the difference between two following? float f1 = some_number; float f2 = some_near_zero_number; float result; result = f1 / f2; and: float f1 = some_number; float f2 = some_near_zero_number; float result; result = (double)f1 / (double)f2; I am especially interested in very small f2 values which may produce...

I found Stevens Computing Services – K & R Exercise 2-1 a very thorough answer to K&R 2-1. This slice of the full code computes the maximum value of a float type in the C programming language. Unluckily my theoretical comprehension of float values is quite limited. I know they...

I like to store Latitudes and Longitudes in a very precise way into my MySql Database with InnoDB. However, float did not offer enough internal decimal places so I switched to double. Wondering myself a little but MySql accepted double with a size up to 30 so I used double(30,27)...

Is it a good practice to use larger precision when computing a sum and reduce the precision at the end of the algorithm? Like float average(const float* begin, const float* end) { double sum=0; size_t N=end-begin; while(begin!=end) { sum+=(double)(*begin); ++begin; } return (float)( sum/N); //Assume range is not empty }...