I was thinking more along the lines of sorting them by char length and then searching on first character and iterating down each char of the string.
Any word of x number of chars that occurs once can be discarded and then you can start filtering the words that have all the same number of chars.
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Reply to: Sorting data by frequency in C#
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Sorting data by frequency in C#"
Collapse
-
Dunno. I'd just use an efficient sort like shell mezner and count number in same blocks. Perhaps instead of sorting on entire string you could sort on just 1st x chars which would give you many unique items that did not need to be further sorted and could be discarded. Those that did sort on next 3 chars and so on.
Leave a comment:
-
Go through the list once to work out the minimum and maximum length. Then if for example the minimum is 3 and the maximum is 5, you do a search for "aaa", "aaaa", and "aaaaa", then move onto "aaaab", until you've exhausted every possible word.
HTH.
Leave a comment:
-
Originally posted by ChurchillObviously the most efficient way would be to use a binary tree and counter mechanism for the elements.
For the terminally bored there is some discussion here:-
http://forum.java.sun.com/thread.jsp...sageID=4319661
Leave a comment:
-
Obviously the most efficient way would be to use a binary tree and counter mechanism for the elements.
Leave a comment:
-
written before the request that it should be written in binary and injected straight into the processor!
Code:Hashtable wordMap = new Hashtable(); string[] words= <your array of words here>; foreach (string currentWord in words) { if (currentWord.Length > 0) { if (wordMap.ContainsKey(currentWord)) { wordMap[currentWord] = (int)wordMap[currentWord]+ 1; } else { wordMap.Add(currentWord, 1); } } } string[] wordNames = (string[])new ArrayList(wordMap.Keys).ToArray(typeof(string)); int[] wordFrequencies = (int[])new ArrayList(wordMap.Values).ToArray(typeof(int)); Array.Sort(wordFrequencies, wordNames); for (int currentWord = 0; currentWord < wordNames.Length; currentWord++) { Console.WriteLine((wordNames[currentWord])+("\t")+(wordFrequencies[currentWord].ToString())); }
Leave a comment:
-
Originally posted by DimPrawnPlease read the spec:
Just asking opinions on the most efficient and elegant way to achieve the following.
HTH
Yes, I did read that bit. And I conformed with it. The person who is not conforming is you.
What you are in effect saying at the moment is that a "standard" way of dealing with it is neither the most efficient not the most elegant.
You may turn out to be right - but you do not yet know that, and since you will not try a standard method you will never know how efficient it was (or wasn't).
Leave a comment:
-
Doesn't anyone study algorithms anymore? A couple I can think of off the top of my head are Boyer-Moore (or a variation thereof) - http://en.wikipedia.org/wiki/Boyer-Moore - and KMP - http://en.wikipedia.org/wiki/Knuth%E...ratt_algorithm
That should give you a head start.
Leave a comment:
-
Originally posted by ASBYes, it is bog standard. But does it work? yes.
Now it may be that it doesn't work well enough, but I guess you'll never find out since it's rejected simply on the grounds of not being clever.
Just asking opinions on the most efficient and elegant way to achieve the following.
HTH
Leave a comment:
-
This might get you on the road to a clever solution:-
http://www.awprofessional.com/articl...&seqNum=8&rl=1
Leave a comment:
-
Originally posted by DimPrawnThat's a pretty long winded and bog standard solution isn't it?
I was asking if their were an elegant and perhaps clever algorithmic way.
I mean, yeah, walk through the list, count them and then sort by frequency.
Even Milan could have thought of that one.
I was hoping someone would know of a cutting edge algorithm that trades off memory for a very little CPU use.
Now it may be that it doesn't work well enough, but I guess you'll never find out since it's rejected simply on the grounds of not being clever.
Leave a comment:
-
Originally posted by DimPrawnI said all the stop words have been removed before hand.
How does "loading them into a hashtable" somehow remove the duplicates and sort them by frequency?
Great answer with insightful analysis, I can see why they pay you £200/day.
I dug out an example which I put a link to (yes I know it's .NET 2).
I'm sure you can manage the sort it will require, though it would probably be quickest just to scan the counts holding the top 5 in another array then extract those elements.
Leave a comment:
-
That's a pretty long winded and bog standard solution isn't it?
I was asking if their were an elegant and perhaps clever algorithmic way.
I mean, yeah, walk through the list, count them and then sort by frequency.
Even Milan could have thought of that one.
I was hoping someone would know of a cutting edge algorithm that trades off memory for a very little CPU use.
Leave a comment:
-
Originally posted by ChurchillQuantify "OK"...
Given that the text has to be obtained from somewhere and then split into the words it may well be that stuffing them into a hashtable is quick enough.
If it takes say 500 ms to get the text and 100 to add them then it's probably OK. If it is the inverse then it probably isn't. I'm sure you could produce something clever that might be better.
Also I just did a search, here is a possible example:-
http://blogs.vbcity.com/mcintyre/arc...2/11/7343.aspx
Ok, it;'s not 1.1 but it would be pretty easy to test the performance.
Leave a comment:
-
Originally posted by ASBJust return:-
"the" "of" "to" "is" "it" and you'll be right quite a lot of the time.
Personally I think I'd just load them into a hashtable and see what the performance was like. If it was OK I wouldn't worry about it.
How does "loading them into a hashtable" somehow remove the duplicates and sort them by frequency?
Great answer with insightful analysis, I can see why they pay you £200/day.
Leave a comment:
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- Contractors, don’t be fooled by HMRC Spotlight 67 on MSCs Today 09:20
- HMRC warns IT consultants and others of 12 ‘payroll entities’ Yesterday 09:15
- How you think you look on LinkedIn vs what recruiters see Dec 2 09:00
- Reports of umbrella companies’ death are greatly exaggerated Nov 28 10:11
- A new hiring fraud hinges on a limited company, a passport and ‘Ade’ Nov 27 09:21
- Is an unpaid umbrella company required to pay contractors? Nov 26 09:28
- The truth of umbrella company regulation is being misconstrued Nov 25 09:23
- Labour’s plan to regulate umbrella companies: a closer look Nov 21 09:24
- When HMRC misses an FTT deadline but still wins another CJRS case Nov 20 09:20
- How 15% employer NICs will sting the umbrella company market Nov 19 09:16
Leave a comment: