Originally posted by PAH
View Post
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
SKA news
Collapse
X
Collapse
-
It's fairly easy so long as you have scalable crawler, the real problem is that handful of very large sites have got hundreds of millions of URLs on them. -
Twitter and their ilk can also relax ACID constraints in certain ways compared to a more conventional DB i.e. it doesn't matter if every user sees the latest tweets from every other user at the same time, no one will notice if they are delayed by 500ms or they don't get the exact same ordering twice in a row. I think it's called BASE (as opposed to ACID).Originally posted by AtW View PostFront end stuff is very easy to run in parallel very cheaply, it's the large scale DB that is a problem for companies like Twitter, Facebook, Google et al.
Real time nature of Twitter certainly made it harder to implement than usual batch processing however inherent advantage in terms of small text size and write once read many times approach make their problem fairly trivial to solve.
It's all really matter of perspective - when you spend your own £50k on stuff like this you'd have to be smart, but when you want to raise hundreds of millions making problem easily solveable will backfire.While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'Comment
-
I think the problem is that you end up looking at all systems like your own and the solution to your issues become the solution to their issues.Originally posted by minestrone View PostYou seem to think running websites is purely down to DB datasize.
Mind you most issues seem to boil down to one of two areas:-
getting data into database
getting data out of database. The latter is more interesting as once you decide that 100% this millisecond accuracy isn't important you can take a lot of short cuts to speed up handling the data.merely at clientco for the entertainmentComment
-
Aye.Originally posted by doodab View PostTwitter and their ilk can also relax ACID constraints in certain ways compared to a more conventional DB
Twitter does not even need to work 100% of the time!!!Comment
-
Offer a cloud service where they can host their sites then you don't need to go crawling them, you'll always be bang up to date.Originally posted by AtW View Postthe real problem is that handful of very large sites have got hundreds of millions of URLs on them.
I wonder if Google or M$ have thought of that yet.Feist - 1234. One camera, one take, no editing. Superb. How they did it
Feist - I Feel It All
Feist - The Bad In Each Other (Later With Jools Holland)Comment
-
Bollocks.Originally posted by AtW View PostFront end stuff is very easy to run in parallel very cheaply.
If I have 1 table with one text field of 140 chars if that gets accessed 1 million times in 1 second that is easier to run than 1 person accessing 140 million chars in one second.
You talk the biggest pile of crap, truly you seem to know jack tulip my simple mathematically challenged friend.Comment
-
Accessing 140 mln chars in one second would require 1 Gbit connectivity and it's done trivially if you have required bandwidth (and low enough latency).Originally posted by minestrone View PostIf I have 1 table with one text field of 140 chars if that gets accessed 1 million times in 1 second that is easier to run than 1 person accessing 140 million chars in one second.
1 mln accesses to 140 charts over TCP/IP might actually be more difficult problem if lots of separate IPs are involved but in such scenarios having 100 cheap boxes would reduce the problem to 10k accesses each per second which is doable.Comment
-
While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'Comment
-
Can I just ask what you think is more problematic for a web server.Originally posted by AtW View PostAccessing 140 mln chars in one second would require 1 Gbit connectivity and it's done trivially if you have required bandwidth (and low enough latency).
1 mln accesses to 140 charts over TCP/IP might actually be more difficult problem if lots of separate IPs are involved but in such scenarios having 100 cheap boxes would reduce the problem to 10k accesses each per second which is doable.
"1 table with one text field of 140 chars if that gets accessed 1 million times in 1 second"
"1 person accessing 140 million chars in one second"Comment
-
You forget. ATW has solved his problem by using a hammer so his immediate solution to all problems is now that hammer. And if it doesn't work to buy a bigger hammer.Originally posted by minestrone View PostCan I just ask what you think is more problematic for a web server.
"1 table with one text field of 140 chars if that gets accessed 1 million times in 1 second"
"1 person accessing 140 million chars in one second"
To be honest that statement is true of most people. They will take a working solution and try and apply it to the next problem that comes along.
Edit to answer the question.
The second one could be a problem based on the size of the network connection.
The first one is a problem for the database but less so if the database keeps recent statements in memory.
Memcache solves the first issue very well. Stackoverflow halved the number of machines they require by caching database results for 3 seconds. I'm sure most popular sites would do the same.Last edited by eek; 9 September 2011, 15:14.merely at clientco for the entertainmentComment
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- How to land a temporary technology job in 2026 Yesterday 07:01
- Spring Forecast 2026 ‘won’t put up taxes on contractors’ Jan 8 07:26
- Six things coming to contractors in 2026: a year of change, caution and (maybe) opportunity Jan 7 06:24
- Umbrella companies, beware JSL tunnel vision now that the Employment Rights Act is law Jan 6 06:11
- 26 predictions for UK IT contracting in 2026 Jan 5 07:17
- How salary sacrifice pension changes will hit contractors Dec 24 07:48
- All the big IR35/employment status cases of 2025: ranked Dec 23 08:55
- Why IT contractors are (understandably) fed up with recruitment agencies Dec 22 13:57
- Contractors, don’t fall foul of HMRC’s expenses rules this Christmas party season Dec 19 09:55
- A delay to the employment status consultation isn’t why an IR35 fix looks further out of reach Dec 18 08:22

Comment