FAQ Database Discussion Community


Fastest use of a dataset of just over 64 bytes?

c,cpu-cache
Structure: I have 8 64-bit integers (512 bits = 64 bytes, the assumed cache line width) that I would like to compare to another, single 64-bit integer, in turn, without cache misses. The data set is, unfortunately, absolutely inflexible -- it's already as small as possible. Access pattern: Each uint64_t...

Data structure in .Net keeping heterogeneous structs contiguous in memory

c#,.net,data-structures,cpu-cache
I'm looking for a data structure in .Net which keep heterogeneous structs contiguous in memory in order to be cpu-cache-friendly. This type of data structure is explained in this blog : T-machine.org at the Iteration 4. In .Net an array of value types (structs) keeps data contiguous in memory, but...

Why does my 8M L3 cache not provide any benefit for arrays larger than 1M?

c++,c,performance,optimization,cpu-cache
I was inspired by this question to write a simple program to test my machine's memory bandwidth in each cache level: Why vectorizing the loop does not have performance improvement My code uses memset to write to a buffer (or buffers) over and over and measures the speed. It also...

Is using a pointer or reference to access a vector and then iterating through it cache unfriendly?

c++,performance,pointers,vector,cpu-cache
I have a pointer to a vector which is stored in some other object. vector<Thing>* m_pThings; Then when I want to iterate through this vector, I use the following for loop: for (auto& aThing : *m_pThings){ aThing.DoSomething(); } assume that Thing::DoSomething() exists. My question is: Does this code cause too...

Array of Structures (AoS) vs Structure of Arrays (SoA) on random reads for vectorization

c++,parallel-processing,vectorization,cpu-cache
My question is in regard of the following phrase from the book: Unfortunately, the SoA form is not ideal in all circumstances. For random or incoherent circumstances, gathers are used to access the data and the SoA form can result in extra unneeded data being read into cache, thus reducing...

Why cache read miss is faster than write miss?

c++,performance,caching,cpu-cache
I need to calculate an array (writeArray) using another array (readArray) but the problem is the index mapping is not the same between arrays (Value at index x of writeArray must be calculated with value at index y of readArray) so it's not very cache friendly. However I can either...

Understand a microbenchmark for Cache/RAM access latency

performance,memory-management,benchmarking,cpu-architecture,cpu-cache
In this picture:pic I don't really understand this plot. It basically shows the performance of reading and writing from different size array with different stride. Each color show different size of array. T know why it encrease but i don't know why it decrease?. So, for example for L (length...

Code duplication reduces effective cache size

c++,cpu-cache
I'm reading a presentation from Scott Mayor, he mentiones this line: Down side of inlining: Code duplication reduces effective cache size I am not seeing how code duplication has anything to do with effective cache size ...

How many bits are in the address field for a directly mapped cache?

caching,system,cpu,computer-architecture,cpu-cache
This is a question based on Direct Mapped Cache so I am assuming that it's ok to ask here as well. Here is the problem I am working on: The Problem: " A high speed workstation has 64 bit words and 64 bit addresses with address resolution at the byte...

Are Lisp lists always implemented as linked lists under the hood?

linked-list,lisp,cpu-cache
Are Lisp lists always implemented as linked lists under the hood? Is this a problem as far as processor caching goes? If so, are there solutions that use more contiguous structures which help caching?...