17,965 questions
0
votes
0
answers
33
views
how to implement blocking system call in xv6?
I am currently reading xv6 kernel source ported to 32bit x86. My question is regarding blocking io syscall, i.e., bread. I am expecting some trap handler in xv6 that sets its own process state to ...
0
votes
1
answer
61
views
How to write userspace or kernel application that would allow me to generate a lot of asynchronous interrupts on x86_64 Linux?
I am studying a performance (progress guarantee?) problem of an x86 hypervisor software. The current hypothesis is like this. There is a high intensity of interrupt requests caused by concurrently ...
0
votes
0
answers
32
views
What protocol does the LLC directory uses to synchronize parallel RFO signals?
The MESI or MOESI protocols need the LLC directory in order to work... and the directory needs to synchronize parallel RFO + snoop-invalidation calls in order for it to work
(in TSO architectures that ...
3
votes
0
answers
99
views
IPC collapse with larger loop bodies despite constant I-cache miss rate, what's the bottleneck?
I'm seeing dramatic instructions-per-cycle collapse (2.08 -> 1.30) when increasing loop body size in simple arithmetic code with no branches, but instruction cache miss rate stays exactly constant ...
-1
votes
0
answers
136
views
Kernel Boot Time Calculation using TSC [closed]
I am trying to measure kernel boot time using TSC, but I consistently see a deviation of around 200 ms, even though I am reading the TSC values at both the start and end of the kernel boot process. I ...
1
vote
0
answers
52
views
How to initialize stack pointer in x86 assembler on Linux [duplicate]
Given the example of a simple program for GNU assembler on i386 architecture in Linux:
.section .data
msg: .ascii "Hi, People!\n"
len = . - msg
.section .text
.global _start
_start:
# ...
6
votes
1
answer
226
views
How to use plain RDTSC without using asm?
I want to use RDTSC in Rust to benchmark how many ticks my function takes.
There's a built-in std::arch::x86_64::_rdtsc, alas it always translates into:
rdtsc
shl rdx, 32
or rax, rdx
...
2
votes
1
answer
84
views
Is there a seq_cst sequence between different parts of an atomic object when atomic operations with different sizes mixed?
Updated:
I already know that this is a UB for ISO C, I apologize for the vague statement I made earlier.
This question originates from my previous question
Can atomic operations of different sizes be ...
3
votes
1
answer
113
views
What is the overhead of jumps and call-rets for CPU front-end decoder?
How jumps and call-ret pairs affect the CPU front-end decoder in the best case scenario when there are few instructions, they are well cached, and branches are well predicted?
For example, I run a ...
-1
votes
0
answers
92
views
Find first bit set within an AVX 512 register [duplicate]
Is there a way to get the index of the first bit set within an AVX 512 register?
I am looking at the Intel Intrinsics Guide but not finding anything.
3
votes
1
answer
101
views
L1-dcache-stores, LLC-stores, cache-references and uncore memory counter don't add up in Linux perf?
I am trying to measure memory bus related performance of a simple test program on an Intel N150 (Twin Lake, which has four Gracemont cores, like Alder Lake E-cores).
PMU counters from perf stat don't ...
2
votes
1
answer
112
views
How do I reconciliate the dual array problem with the nature of hardware gather/scatter?
Say I have an array of a given object type which keeps the index to a target in the same array.
struct type_1 { float data; int target_index; };
struct type_1 first_array[1024];
first_array[0]....
0
votes
0
answers
70
views
How to compile an Assembly programm when using other libraries
When I try to use the InitWindow function from raylib using this code:
global _main
extern InitWindow
extern _ExitProcess@4
section .data
title db "Window Title",0
section .text
_main:...
1
vote
0
answers
88
views
What's the difference between label and constant x64 AT&T assembly [duplicate]
Some context behind the question. I tried writing a simple exit call like this
.data
.equ EXIT, 60
.equ STATUS, 0
.text
movq EXIT, %rax
movq STATUS, %rdi
syscall
however the code fails with a ...
4
votes
1
answer
149
views
On x86-64 can aligned writes to *code* be assumed to be read atomically by other cores?
I'm investigating the possibility of cross-modifying (hotpatching) code without pausing other threads.
The Intel and AMD manuals specifically document that aligned writes to memory of 1, 2, 4 or 8 ...