Sorting data by frequency in C# - Contractor UK Bulletin Board

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Ardesco replied

13 July 2007, 08:18
I was thinking more along the lines of sorting them by char length and then searching on first character and iterating down each char of the string.

Any word of x number of chars that occurs once can be discarded and then you can start filtering the words that have all the same number of chars.
Leave a comment:
xoggoth replied

12 July 2007, 20:34
Dunno. I'd just use an efficient sort like shell mezner and count number in same blocks. Perhaps instead of sorting on entire string you could sort on just 1st x chars which would give you many unique items that did not need to be further sorted and could be discarded. Those that did sort on next 3 chars and so on.
Leave a comment:
VectraMan replied

12 July 2007, 15:14
Go through the list once to work out the minimum and maximum length. Then if for example the minimum is 3 and the maximum is 5, you do a search for "aaa", "aaaa", and "aaaaa", then move onto "aaaab", until you've exhausted every possible word.

HTH.
Leave a comment:
ASB replied

12 July 2007, 14:36
Originally posted by Churchill

Obviously the most efficient way would be to use a binary tree and counter mechanism for the elements.

Not necessarily. It will depend on the actual words and how they map to whatever positioning algorithm is chosen.

For the terminally bored there is some discussion here:-

http://forum.java.sun.com/thread.jsp...sageID=4319661
Leave a comment:
Churchill replied

12 July 2007, 14:27
Obviously the most efficient way would be to use a binary tree and counter mechanism for the elements.
Leave a comment:

Burdock replied

12 July 2007, 14:11

written before the request that it should be written in binary and injected straight into the processor!

Code:

 Hashtable wordMap = new Hashtable();

            string[] words= <your array of words here>;


            foreach (string currentWord in words)
            {
	            if (currentWord.Length > 0)
	            {
		            if (wordMap.ContainsKey(currentWord))
		            {
			            wordMap[currentWord] = (int)wordMap[currentWord]+ 1;
		            }
                    else
                    {
                        wordMap.Add(currentWord, 1);
                    }
                }
            }
		   

            string[] wordNames = (string[])new ArrayList(wordMap.Keys).ToArray(typeof(string));
            int[] wordFrequencies = (int[])new
            ArrayList(wordMap.Values).ToArray(typeof(int));
            Array.Sort(wordFrequencies, wordNames);
            for (int currentWord = 0; currentWord < wordNames.Length; currentWord++)
	        {
                Console.WriteLine((wordNames[currentWord])+("\t")+(wordFrequencies[currentWord].ToString()));
            }

Leave a comment:

ASB replied

12 July 2007, 13:23
Originally posted by DimPrawn

Please read the spec:

Just asking opinions on the most efficient and elegant way to achieve the following.

HTH

Yes, I did read that bit. And I conformed with it. The person who is not conforming is you.

What you are in effect saying at the moment is that a "standard" way of dealing with it is neither the most efficient not the most elegant.

You may turn out to be right - but you do not yet know that, and since you will not try a standard method you will never know how efficient it was (or wasn't).
Leave a comment:
Cowboy Bob replied

12 July 2007, 13:22
Doesn't anyone study algorithms anymore? A couple I can think of off the top of my head are Boyer-Moore (or a variation thereof) - http://en.wikipedia.org/wiki/Boyer-Moore - and KMP - http://en.wikipedia.org/wiki/Knuth%E...ratt_algorithm

That should give you a head start.
Leave a comment:
DimPrawn replied

12 July 2007, 13:02
Originally posted by ASB

Yes, it is bog standard. But does it work? yes.

Now it may be that it doesn't work well enough, but I guess you'll never find out since it's rejected simply on the grounds of not being clever.

Please read the spec:

Just asking opinions on the most efficient and elegant way to achieve the following.

HTH
Leave a comment:
ASB replied

12 July 2007, 13:01
This might get you on the road to a clever solution:-

http://www.awprofessional.com/articl...&seqNum=8&rl=1
Leave a comment:
ASB replied

12 July 2007, 12:58
Originally posted by DimPrawn

That's a pretty long winded and bog standard solution isn't it?

I was asking if their were an elegant and perhaps clever algorithmic way.

I mean, yeah, walk through the list, count them and then sort by frequency.

Even Milan could have thought of that one.

I was hoping someone would know of a cutting edge algorithm that trades off memory for a very little CPU use.

Yes, it is bog standard. But does it work? yes.

Now it may be that it doesn't work well enough, but I guess you'll never find out since it's rejected simply on the grounds of not being clever.
Leave a comment:
ASB replied

12 July 2007, 12:55
Originally posted by DimPrawn

I said all the stop words have been removed before hand.

How does "loading them into a hashtable" somehow remove the duplicates and sort them by frequency?

Great answer with insightful analysis, I can see why they pay you £200/day.

It was more helpful than yours normally are

I dug out an example which I put a link to (yes I know it's .NET 2).

I'm sure you can manage the sort it will require, though it would probably be quickest just to scan the counts holding the top 5 in another array then extract those elements.
Leave a comment:
DimPrawn replied

12 July 2007, 12:53
That's a pretty long winded and bog standard solution isn't it?

I was asking if their were an elegant and perhaps clever algorithmic way.

I mean, yeah, walk through the list, count them and then sort by frequency.

Even Milan could have thought of that one.

I was hoping someone would know of a cutting edge algorithm that trades off memory for a very little CPU use.
Leave a comment:
ASB replied

12 July 2007, 12:46
Originally posted by Churchill

Quantify "OK"...

Can't. Depends on the usage pattern doesn't it.

Given that the text has to be obtained from somewhere and then split into the words it may well be that stuffing them into a hashtable is quick enough.

If it takes say 500 ms to get the text and 100 to add them then it's probably OK. If it is the inverse then it probably isn't. I'm sure you could produce something clever that might be better.

Also I just did a search, here is a possible example:-

http://blogs.vbcity.com/mcintyre/arc...2/11/7343.aspx

Ok, it;'s not 1.1 but it would be pretty easy to test the performance.
Leave a comment:
DimPrawn replied

12 July 2007, 12:34
Originally posted by ASB

Just return:-

"the" "of" "to" "is" "it" and you'll be right quite a lot of the time.

Personally I think I'd just load them into a hashtable and see what the performance was like. If it was OK I wouldn't worry about it.

I said all the stop words have been removed before hand.

How does "loading them into a hashtable" somehow remove the duplicates and sort them by frequency?

Great answer with insightful analysis, I can see why they pay you £200/day.
Leave a comment:

Reply to: Sorting data by frequency in C#

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

Previously on "Sorting data by frequency in C#"

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Partners

Advertisers

Contractor Services

CUK News

Reply to: Sorting data by frequency in C#

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

Previously on "Sorting data by frequency in C#"

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Partners

Advertisers

Contractor Services

CUK News

Tag Cloud