Loop unrolling

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

AtW replied

11 February 2010, 10:27
Loop unrolling these days is executing in parallel on multiple cores.

HTH
Leave a comment:
VectraMan replied

11 February 2010, 10:22
Originally posted by bobhope View Post

At least they taught what loop unrolling / rolling is on his course. Unlike most courses nowadays which consist of "using powerpoint"

Or worse, Java.

Loop unrolling would work pre Pentium 4 (I think that was the one). That's because the jump would stall the clever look ahead / out of order execution stuff. But with the P4 the clever stuff would guess the most likely outcome of a jump before it got there and assume you'd continue round the loop - no penalty at all for looping, just a small penalty for the exit case. It'd probably be faster not unrolled because there was less code to load.

I went to an Intel course once where they taught this. And what worked on the pre-P4 course (I'm racking my brain trying to remember the names) they told us to forget for the P4.
Leave a comment:
Scary replied

11 February 2010, 10:22
Originally posted by bobhope View Post

At least they taught what loop unrolling / rolling is on his course. Unlike most courses nowadays which consist of "using powerpoint"

On a Computer Science/Software Engineering curriculum? I don't think so.
Leave a comment:
bobhope replied

11 February 2010, 09:56
At least they taught what loop unrolling / rolling is on his course. Unlike most courses nowadays which consist of "using powerpoint"
Leave a comment:
AtW replied

11 February 2010, 09:54
Originally posted by Scary View Post

It's usually better to let the compiler (optimiser) decide. You should be able to tell it whether you prefer space or speed too.

The Art of Optimisations is pretty much lost
Leave a comment:
Churchill replied

11 February 2010, 09:29
Originally posted by d000hg View Post

Exactly. Usually. Not always... sometimes unrolling or employing SIMD ops (which compilers still suck at) are effective. Sometimes very effective, in the SIMD case. But only after profiling.

I use the supplied intinsics rather than rely on the compiler - however, Intel's performance libraries are pretty sweet.

Originally posted by Threaded

Then we're into, can you prove it in 'O notation'. It's like, err, how's about trying the 'stopwatch method'?

I listened to a podcast about "Critical Software Development" by Martin Thomas. Apparently we(C & C++) developers shouldn't be allowed anywhere near it... Oh well...

Last edited by Churchill; 11 February 2010, 09:34.
Leave a comment:
DimPrawn replied

11 February 2010, 09:29
Hey Threaded, leave them kids alone, all in all you're just another brick in the wall...
Leave a comment:
threaded replied

11 February 2010, 08:50
Originally posted by d000hg View Post

It does work in the right places. I used it on a tight inner loop for a 3D render engine that was looping through every pixel on the screen many times a second. Of course, the code inside the loop was absolutely optimised to minimise the number of cycles (about 10) so the loop overhead was actually significant.

I agree it's not something you'd expect in a normal business software application.

But it is not all you did? I would imagine you needed to ensure the code and data was nicely aligned and didn't cross page/cache boundaries, etc. etc.. It essentially ended up a very hand crafted affair, and most of what you did is contrary to what the 'books' say.

My point is that the theoretical model of a computer taught on CompSci courses is so far from reality, that the schools build their courses and education on a theoretical model that is so borked, that they then teach things that are not only wrong, but exactly the opposite of what happens in the real world. And that without real lab experience like you have in real science and engineering disciplines you eventually end up with these people behind building the totally screwed IT systems you meet at clients.

At a client they had this parallel application. They'd hand un-rolled loops and done all sorts, and it was getting slower and slower. This was blamed on the extra features that'd been added at the same time...

I rolled them loops up, and the thing started running super-linear.

Then we're into, can you prove it in 'O notation'. It's like, err, how's about trying the 'stopwatch method'?
Leave a comment:
OwlHoot replied

11 February 2010, 08:48
Originally posted by threaded View Post

I had some plonker try and tell me 'loop unrolling' was a good way to up the speed of this application we were looking at.

I enquired if they'd ever actually really tried it... Of course not.

So with my "let me show you what experience is" hat on, flicked the compile flag, recompiled, run, and, lo and behold, it slowed everything right down.

Turned out it was some tulip they were taught on their degree about improving performance.

Give me strength! They don't half teach some flipping rubbish on these computer science courses.

A lot depends how large the caches are, whether they be memory v. disk or, at a lower level address translation caches and prefetch caches etc.
Leave a comment:
d000hg replied

11 February 2010, 07:58
Exactly. Usually. Not always... sometimes unrolling or employing SIMD ops (which compilers still suck at) are effective. Sometimes very effective, in the SIMD case. But only after profiling.
Leave a comment:
Scary replied

11 February 2010, 07:52
It's usually better to let the compiler (optimiser) decide. You should be able to tell it whether you prefer space or speed too.
Leave a comment:
d000hg replied

11 February 2010, 07:40
It does work in the right places. I used it on a tight inner loop for a 3D render engine that was looping through every pixel on the screen many times a second. Of course, the code inside the loop was absolutely optimised to minimise the number of cycles (about 10) so the loop overhead was actually significant.

I agree it's not something you'd expect in a normal business software application.
Leave a comment:
threaded started a topic Loop unrolling

11 February 2010, 05:35
Loop unrolling

I had some plonker try and tell me 'loop unrolling' was a good way to up the speed of this application we were looking at.

I enquired if they'd ever actually really tried it... Of course not.

So with my "let me show you what experience is" hat on, flicked the compile flag, recompiled, run, and, lo and behold, it slowed everything right down.

Turned out it was some tulip they were taught on their degree about improving performance.

Give me strength! They don't half teach some flipping rubbish on these computer science courses.
Tags: None