• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

C# vs C++ performance

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    #11
    From my experience this kind of number crunching is where you'll expose the worst Java/C# performance because each operation is tiny, meaning ANY overhead is quite significant. i.e. running a very tight loop with not much happening inside.

    It is interesting that it doesn't optimise better, unless the underlying data structures/objects have extra safety-code running perhaps? I don't know what algorithms are used, I wouldn't bet C# and C++ use exactly the same for all collections/algorithms though. No doubt one can find out, though with STL it is implementation-specific IIRC.
    Originally posted by MaryPoppins
    I'd still not breastfeed a nazi
    Originally posted by vetran
    Urine is quite nourishing

    Comment


      #12
      Originally posted by VectraMan View Post
      That depends on what you mean. With C++ .NET you can do "pure" CLR, which means all .NET, and I would imagine that'd be much the same as C# because it's doing the same thing. Or you can do mixed CLR with native code, so assuming you do the faster sort with native code it'll be much the same as the regular C++ performance.

      No doubt someone will say that proves it's not the language, but the platform, but a) C++ .NET isn't C++, and b) of course it's the platform, but as you can't seperate C# from .NET that's the only thing you can test.
      I meant identical code as /clrure. I had a look at sorting small structures that contain a key & array index as I think that's more like something I would do, and disregarding setup, allocation, copying etc, just timing the call to the sort routines I got:

      Native takes ~70ms for 1/2 million items
      C++ CLR takes ~150ms
      C# takes ~ 400ms using Array.Sort<T> and IComparer
      C# with ~150ms using the "indirect" sort Array.Sort<int, T>

      So the native code seems to have a 2x advantage and the best C# method seems to vary quite a bit depending on what you are sorting i.e. for the downloaded code the use of a delegate was fastest, I found IComparer faster and the indirect sort faster still.

      I will knock something similar up in Java later if I can be arsed.

      Edit: Java takes about 500ms out of the box (the code is a JUnit test BTW, force of habit) but can be reduced quite a lot by e.g. using latest VM instead of 1.5, tuning vmopts, so I'm down to sub 200ms now.

      Code:
      import java.util.Arrays;
      
      import org.junit.Test;
      
      public class BigSort {
      
      	public class Data implements Comparable<Data> {
      		int key = (int) Math.round(Math.random() * (1024 << 10));
      		byte[] data = new byte[128];
      		public int compareTo(Data o) {
      			return key - o.key;
      		}
      	}
      	
      	public class Ref implements Comparable<Ref> {
      		int key = (int) Math.round(Math.random() * (1024 << 10));
      		int data;
      		public int compareTo(Ref o) {
      			return key - o.key;
      		}
      	}
      	
      	public long dataSort(int size) {
      		Data data[] = new Data[size];
      		for (int i = 0; i < size; i++) {
      			data[i] = new Data();
      		}
      		long start = System.nanoTime();
      		Arrays.sort(data);
      		return System.nanoTime() - start;
      	}
      	
      	public long refSort(int size) {
      		Ref ref[] = new Ref[size];
      		for (int i = 0; i < size; i++) {
      			ref[i] = new Ref();
      		}
      		long start = System.nanoTime();
      		Arrays.sort(ref);
      		return System.nanoTime() - start;
      	}
      	
      	@Test
      	public void testSorting() {
      		for (int i = 0; i < 10; i++) {
      			int size = 1024 << i;
      			
      			long time = 0;
      			for (int j = 0; j < 5; j++) {
      				time = time + dataSort(size);
      			}
      			time = time / 5;
      			System.out.println("data " + size + " = " + time + " ns");
      			
      			time = 0;
      			for (int j = 0; j < 5; j++) {
      				time = time + refSort(size);
      			}
      			time = time / 5;
      			System.out.println("ref " + size + " = " + time + " ns");
      		}
      	}
      	
      }
      Last edited by doodab; 31 May 2011, 15:00.
      While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'

      Comment


        #13
        Originally posted by VectraMan View Post
        The default behaviour of C# is to store a reference to the objects, which means the sort can swap two items around by swapping the references only. Whereas C++ the swap works directly on the objects, so how he had it every swap was two copies of the data in the class. I guess that's typical C++, and that was the case where C++ was half the speed. But I changed it to use a dynamically allocated buffer and rValue references to do a faster move.

        Which maybe shows that an average C# programmer following typical practices probably produces faster code than an average C++ programmer doing the same, but a good C++ programmer will spot the inneficiency and can produce faster code than is possible in C#.
        As fas as I can tell the whole point of the synthetic test was to sort a bunch of large objects i.e. including the 128 bytes of data, so they were in order in memory. That was why the original code used an unsafe struct, it was required in order to get the whole struct allocated in a single chunk. Otherwise you would just hold an int and a reference to the array. I suppose the main thing to take from that is that the layout of data in memory isn't something that C# programmers are supposed to worry about.

        Of course if you are bothered about having the data ordered in memory then a large byte[] array and using Array.Copy might be a better choice.

        Personally I would use C
        Last edited by doodab; 31 May 2011, 14:16.
        While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'

        Comment


          #14
          x2 performance can be achieved by buying double number of cores - usually that's cheaper than paying for x2 time of programmer dealing with C++ buggy code.

          HTH

          Comment


            #15
            Originally posted by AtW View Post
            x2 performance can be achieved by buying double number of cores - usually that's cheaper than paying for x2 time of programmer dealing with C++ buggy code.

            HTH
            only if the code runs in parallel or is sensibly threaded.

            Buying an x core processor is probably not much use for your average single thread noddy c# application. For most other uses though faster hardware is far more likely to fix a problem
            merely at clientco for the entertainment

            Comment


              #16
              Originally posted by doodab View Post
              I meant identical code as /clrure. I had a look at sorting small structures that contain a key & array index as I think that's more like something I would do, and disregarding setup, allocation, copying etc, just timing the call to the sort routines I got:

              Native takes ~70ms for 1/2 million items
              C++ CLR takes ~150ms
              C# takes ~ 400ms using Array.Sort<T> and IComparer
              C# with ~150ms using the "indirect" sort Array.Sort<int, T>

              So the native code seems to have a 2x advantage and the best C# method seems to vary quite a bit depending on what you are sorting i.e. for the downloaded code the use of a delegate was fastest, I found IComparer faster and the indirect sort faster still.

              I will knock something similar up in Java later if I can be arsed.

              Edit: Java takes about 500ms out of the box (the code is a JUnit test BTW, force of habit) but can be reduced quite a lot by e.g. using latest VM instead of 1.5, tuning vmopts, so I'm down to sub 200ms now.
              Nice work.

              I can't see that any of this is doing any extra bounds checking or anything fundamentally different in the algorithms, so it has to come down to general inefficiencies of the runtime. To be honest, as much as I'm a C++ programmer and obviously wanted my side to win, I'm kind of suprised. I was expecting with all the improvements of JIT and modern processors to be getting about 10% at best with C++.

              /clrure is a bit pointless really; without the ability to seemlessly integrate with native code I'd never bother with C++ .NET, and just use C#. If anything C# is probably easier for an experienced C++ programmer to figure out than C++ .NET.
              Will work inside IR35. Or for food.

              Comment


                #17
                Originally posted by AtW View Post
                x2 performance can be achieved by buying double number of cores - usually that's cheaper than paying for x2 time of programmer dealing with C++ buggy code.
                I ran it on a machine with dual cores. And it only used one. So there's your money wasted then.
                Will work inside IR35. Or for food.

                Comment


                  #18
                  Originally posted by VectraMan View Post
                  Nice work.

                  I can't see that any of this is doing any extra bounds checking or anything fundamentally different in the algorithms, so it has to come down to general inefficiencies of the runtime. To be honest, as much as I'm a C++ programmer and obviously wanted my side to win, I'm kind of suprised. I was expecting with all the improvements of JIT and modern processors to be getting about 10% at best with C++.

                  /clrure is a bit pointless really; without the ability to seemlessly integrate with native code I'd never bother with C++ .NET, and just use C#. If anything C# is probably easier for an experienced C++ programmer to figure out than C++ .NET.
                  The algorithms are probably the same, but array bounds checking is done by the runtime, it actually generates extra instructions to do it. One of the common optimisations is eliminating the checks when the compiler and VM can guarantee that it won't be a problem e.g. a for array access in a loop with upper bound = size of the array etc, so without looking at the libraries & generated code in detail it's impossible to tell if that is happening or not.

                  I would still argue that for the average server side business or web app even a 2x speedup on something like that is irrelevant because what you are speeding up only represents a small fraction of where the time goes anyway. And as you pointed out the difference between good code and average code can be worth twice as much as the difference between languages, which means I'm probably best off sticking with Java.
                  While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'

                  Comment


                    #19
                    Originally posted by VectraMan View Post
                    I ran it on a machine with dual cores. And it only used one. So there's your money wasted then.
                    Who is making general purpose single cores CPUs anymore? Most of such CPUs are multi cores and for all intents and purposes dual cores are free.

                    Any code these days that aims to have high performance should not only be parallel to support multiple cores on the same box, but also (if that is required) multiple boxes working in sync towards same goal.

                    SKA C# code runs on a cluster with 1 TB of RAM and 144 cores.

                    HTH

                    Comment


                      #20
                      The trouble is there's a lot of propoganda about VM' s being just as fast as a native compled code. That's baloney and always will be. Java and C# are 3GL languages just like any other that have their own runtime environments, there's an overhead and with a large amount of data that is very noticeable. I have to face a horribly slow Java app everyday that slows my productivity down. These 3GL's are great for User interfaces but shouldn't really be used for serious "Data processing"; you simply throw away CPU "bandwidth".
                      I'm alright Jack

                      Comment

                      Working...
                      X