FAQ Database Discussion Community

VTune Amplifier XE 2015 architecural anaylsis

I recently downloaded the VTune Amplifier XE 2015 to profile applications. For analysis, I want to profile in terms of both architectural and micro-architectural events. I found that it is possible to get the micro-architectural analysis when starting a New Analysis, but was not able to find how to get...

memory segments and physical RAM [closed]

The memory map of a process appears to be fragmented into segments (stack, heap, bss, data, and text), I was wondering are these segments just an abstraction for the convenience of the process and the physical RAM is just a linear array of addresses or is the physical RAM also...

Adding my own library to Contiki OS

I want to add some third party libraries to Contiki, but at the moment I can't. So I wanted to just test with a simple library. I wrote two files hello.c hello.h, in hello.c I have: printf(" Hello everbody, library call\n"); In hello.h I have: extern void print_hello(); I created...

Are correct branch predictions free?

Let's say you make some code that has an if statement and condition in that if statement always ends up being true for the entire run of your program, but that it can't be known at compile time that the condition is always true (maybe its specified on the command...

How can a branch instruction be mispredicted AND retired?

Intel has a hardware event counter called: BR_MISP_RETIRED.ALL_BRANCHES where the description says: Mispredicted macro branch instructions retired. But retired instructions are those which were correctly-required: Modern processors execute much more instructions that the program flow needs. This is called "speculative execution". Then the instructions that were "proven" as indeed needed...

function arguments loading to registers on x64

I have this little C code void decode(int *xp,int *yp,int *zp) { int a,b,c; a=*yp; b=*zp; c=*xp; *yp=c; *zp=a; *xp=b; } Then I compiled it to object file using gcc -c -O1 decode.c, and then dumped the object with objdump -M intel -d decode.o and the equivalent assembly code for...

For a Single Cycle CPU How Much Energy Required For Execution Of ADD Command

The question is obvious like specified in the title. I wonder this. Any expert can help?

Do all 64 bit intel architectures support SSSE3/SSE4.1/SSE4.2 instructions?

I did searched on web and intel Software manual . But am unable to confirm if all Intel 64 architectures support upto SSSE3 or upto SSE4.1 or upto SSE4.2 or AVX etc. So that I would be able to use minimum SIMD supported instructions in my programme. Please help.

Multi-level page tables

An x86 with 32 bit addressing and 4K pages would need a page table with 2ˆ20 entries to map an entire address space. Since each page table entry is usually four bytes, this would make the page tables an impractical 4 megabytes long. As a result, paged architectures page...

Meaning of this set of instructions in Mic-1 [MAL Language]

The sequence of Mic-1 instructions below realize a new instruction bish8pu x (x is an offset in 8 bit in binary code). What is the meaning of this set of instructions? bish8pu1 MAR=SP bish8pu2 H=TOS << 8 bish8pu3 TOS=MDR=MBRU OR H;wr bish8pu4 PC=PC+1;fetch bish8pu5 goto Main1 Thanks a lot...

Can x86_64 CPU execute two same operations on the same stage of pipeline?

As known Intel x86_64 processors are not only pipelined architecture, but also superscalar. This is mean that CPU can: Pipeline - At one clock, execute some stages of one operation. For example, two ADDs in parallel with shifting of stages: ADD(stage1) -> ADD(stage2) -> nothing nothing -> ADD(stage1) -> ADD(stage2)...

How can I know that my CPU shares the vector registers among the cores or each core has its private ones

How can I know that my CPU shares the vector registers among the cores or each core has its private ones? Where can I get the references? I hope to use multi-threading and SIMD to optimise my program's floating-point computation. Will they cause any conflicts?...

Effective Address calculation time on 8086/8088

I've started to implement a 8086/8088 with the goal of being cycle-exact. I can understand the reasoning behind the number of clock cycles for most instructions, however I must say I'm quite puzzled by the Effective Address (EA) calculation time. More specifically, why does computing BP + DI or BX...

dumpbin reporting wrong target architecture for a static library

I don't understand why dumpbin is returning x64 when executing the following on the Visual Studio command line: dumpbin libgmp.lib /HEADERS |more This is the GMP library compiled under Cygwin 32bit version, with the following build configuration: ./configure --host=i386 ABI=32 The build system compiled and built all the files successfully...

How to mount the huge tlb (huge page) as a file system?

Here is my machine details (ubuntu): $uname -a Linux rex-think 3.13.0-46-generic #76-Ubuntu SMP Thu Feb 26 18:52:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux I have enabled huge page in root user with: $echo 20 > /proc/sys/vm/nr_hugepages Now I want to mount huge page as a file system and open it...

Understand a microbenchmark for Cache/RAM access latency

In this picture:pic I don't really understand this plot. It basically shows the performance of reading and writing from different size array with different stride. Each color show different size of array. T know why it encrease but i don't know why it decrease?. So, for example for L (length...

Why not to double number of registers for fast syscalls?

We are facing two facts: 1. Syscalls are expensive. Program should save its state on stack, trap to kernel, which causes cache and TLB invalidation, etc etc. 2. With new technologies(like 14nm) we have plenty of space on chips. Why not to have two sets of registers and two TLBs?...

Index register in cpu (Computer org. and arc.)

Can index register have negative value? For example: at start Xr is 0, and then we need to decrement it? What will be the value of Xr?

What is faster: equal check or sign check

I wonder which operation works faster: int c = version1.compareTo(version2); This one if (c == 1) or this if (c > 0) Does sign comparasion use just a one bit check and equality comparasion use substraction, or it is not true? For certainty, let's say we work on x86. P.S....

FPGA verilog code upload speed and size limit

I have two question about FPGA 1. I would like to know how large FPGA chip size would be if I create a full CPU with pipeline. Any calculation method or paper that describes how I can calculate the chip size? 2. If I upload fairly reasonable functions (or modules)...