Comments on: Gallery of Processor Cache Effects

By: Alan

Alan — Mon, 28 Sep 2020 07:29:35 +0000

Hello, is it possible to run these tests on a language like python and get similar results?

By: shahrestanbar

shahrestanbar — Wed, 08 Jul 2020 07:15:12 +0000

Hi was curious if anyone tried the example 3. I’m not sure how that example works, it looks very brief.

Thanks

By: Darnell

Darnell — Wed, 26 Feb 2020 20:20:18 +0000

In ex 5 your terminology is a bit inconsistent. You seem to use the word ‘slot’ and ‘line’ interchangeable and you referred to a “8MB cache line” at one point.

By: Justin

Justin — Tue, 26 Mar 2019 16:50:04 +0000

BTW worth noting that those loops do not actually do integer multiplication, the compiler should optimize *=3 into a couple of adds (or a single lea if on x86) plus the loads/stores.

By: Anonymous

Anonymous — Thu, 26 Apr 2018 09:31:31 +0000

hi, i test example1 on my computer and got the below, why the results are not the same as u mentioned

test01 case1, time cost=580
test01 case2, time cost=70

By: Anonymous

Anonymous — Tue, 28 Mar 2017 09:46:26 +0000

Very useful and very interesting! Thanks for the great post!

By: j b

j b — Sat, 21 Jan 2017 23:28:59 +0000

I’m shocked you dared using C# for such a test. The world has lost its mind.

By: vineet

vineet — Sat, 14 Jan 2017 01:37:48 +0000

Thanks for the great info. I have few questions. How did you calculate those cache misses? How did you setup?

By: jason

jason — Wed, 23 Nov 2016 23:09:41 +0000

Thanks for this. I’ve done similar experiments in the past and this is one rationale behind why hardware acceleration can be so effective on problems people assume a CPU should handle well.

By: A.Samih

A.Samih — Sat, 04 Jun 2016 05:40:39 +0000

Regarding Part 3: The code mentioned can’t defeat a prefetcher as the stride is pretty predictable. You would need a pseudo-random circular pattern and a pointer chasing approach to achieve that.