• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Sorting data by frequency in C#"

Collapse

  • Ardesco
    replied
    I was thinking more along the lines of sorting them by char length and then searching on first character and iterating down each char of the string.

    Any word of x number of chars that occurs once can be discarded and then you can start filtering the words that have all the same number of chars.

    Leave a comment:


  • xoggoth
    replied
    Dunno. I'd just use an efficient sort like shell mezner and count number in same blocks. Perhaps instead of sorting on entire string you could sort on just 1st x chars which would give you many unique items that did not need to be further sorted and could be discarded. Those that did sort on next 3 chars and so on.

    Leave a comment:


  • VectraMan
    replied
    Go through the list once to work out the minimum and maximum length. Then if for example the minimum is 3 and the maximum is 5, you do a search for "aaa", "aaaa", and "aaaaa", then move onto "aaaab", until you've exhausted every possible word.

    HTH.

    Leave a comment:


  • ASB
    replied
    Originally posted by Churchill
    Obviously the most efficient way would be to use a binary tree and counter mechanism for the elements.
    Not necessarily. It will depend on the actual words and how they map to whatever positioning algorithm is chosen.

    For the terminally bored there is some discussion here:-

    http://forum.java.sun.com/thread.jsp...sageID=4319661

    Leave a comment:


  • Churchill
    replied
    Obviously the most efficient way would be to use a binary tree and counter mechanism for the elements.

    Leave a comment:


  • Burdock
    replied
    written before the request that it should be written in binary and injected straight into the processor!

    Code:
     Hashtable wordMap = new Hashtable();
    
                string[] words= <your array of words here>;
    
    
                foreach (string currentWord in words)
                {
    	            if (currentWord.Length > 0)
    	            {
    		            if (wordMap.ContainsKey(currentWord))
    		            {
    			            wordMap[currentWord] = (int)wordMap[currentWord]+ 1;
    		            }
                        else
                        {
                            wordMap.Add(currentWord, 1);
                        }
                    }
                }
    		   
    
                string[] wordNames = (string[])new ArrayList(wordMap.Keys).ToArray(typeof(string));
                int[] wordFrequencies = (int[])new
                ArrayList(wordMap.Values).ToArray(typeof(int));
                Array.Sort(wordFrequencies, wordNames);
                for (int currentWord = 0; currentWord < wordNames.Length; currentWord++)
    	        {
                    Console.WriteLine((wordNames[currentWord])+("\t")+(wordFrequencies[currentWord].ToString()));
                }

    Leave a comment:


  • ASB
    replied
    Originally posted by DimPrawn
    Please read the spec:

    Just asking opinions on the most efficient and elegant way to achieve the following.

    HTH


    Yes, I did read that bit. And I conformed with it. The person who is not conforming is you.

    What you are in effect saying at the moment is that a "standard" way of dealing with it is neither the most efficient not the most elegant.

    You may turn out to be right - but you do not yet know that, and since you will not try a standard method you will never know how efficient it was (or wasn't).

    Leave a comment:


  • Cowboy Bob
    replied
    Doesn't anyone study algorithms anymore? A couple I can think of off the top of my head are Boyer-Moore (or a variation thereof) - http://en.wikipedia.org/wiki/Boyer-Moore - and KMP - http://en.wikipedia.org/wiki/Knuth%E...ratt_algorithm

    That should give you a head start.

    Leave a comment:


  • DimPrawn
    replied
    Originally posted by ASB
    Yes, it is bog standard. But does it work? yes.

    Now it may be that it doesn't work well enough, but I guess you'll never find out since it's rejected simply on the grounds of not being clever.
    Please read the spec:

    Just asking opinions on the most efficient and elegant way to achieve the following.

    HTH

    Leave a comment:


  • ASB
    replied
    This might get you on the road to a clever solution:-

    http://www.awprofessional.com/articl...&seqNum=8&rl=1

    Leave a comment:


  • ASB
    replied
    Originally posted by DimPrawn
    That's a pretty long winded and bog standard solution isn't it?

    I was asking if their were an elegant and perhaps clever algorithmic way.

    I mean, yeah, walk through the list, count them and then sort by frequency.

    Even Milan could have thought of that one.

    I was hoping someone would know of a cutting edge algorithm that trades off memory for a very little CPU use.
    Yes, it is bog standard. But does it work? yes.

    Now it may be that it doesn't work well enough, but I guess you'll never find out since it's rejected simply on the grounds of not being clever.

    Leave a comment:


  • ASB
    replied
    Originally posted by DimPrawn
    I said all the stop words have been removed before hand.


    How does "loading them into a hashtable" somehow remove the duplicates and sort them by frequency?

    Great answer with insightful analysis, I can see why they pay you £200/day.


    It was more helpful than yours normally are

    I dug out an example which I put a link to (yes I know it's .NET 2).

    I'm sure you can manage the sort it will require, though it would probably be quickest just to scan the counts holding the top 5 in another array then extract those elements.

    Leave a comment:


  • DimPrawn
    replied
    That's a pretty long winded and bog standard solution isn't it?

    I was asking if their were an elegant and perhaps clever algorithmic way.

    I mean, yeah, walk through the list, count them and then sort by frequency.

    Even Milan could have thought of that one.

    I was hoping someone would know of a cutting edge algorithm that trades off memory for a very little CPU use.

    Leave a comment:


  • ASB
    replied
    Originally posted by Churchill
    Quantify "OK"...
    Can't. Depends on the usage pattern doesn't it.

    Given that the text has to be obtained from somewhere and then split into the words it may well be that stuffing them into a hashtable is quick enough.

    If it takes say 500 ms to get the text and 100 to add them then it's probably OK. If it is the inverse then it probably isn't. I'm sure you could produce something clever that might be better.

    Also I just did a search, here is a possible example:-

    http://blogs.vbcity.com/mcintyre/arc...2/11/7343.aspx

    Ok, it;'s not 1.1 but it would be pretty easy to test the performance.

    Leave a comment:


  • DimPrawn
    replied
    Originally posted by ASB
    Just return:-

    "the" "of" "to" "is" "it" and you'll be right quite a lot of the time.

    Personally I think I'd just load them into a hashtable and see what the performance was like. If it was OK I wouldn't worry about it.
    I said all the stop words have been removed before hand.


    How does "loading them into a hashtable" somehow remove the duplicates and sort them by frequency?

    Great answer with insightful analysis, I can see why they pay you £200/day.


    Leave a comment:

Working...
X