Network speed - sanity check please

**PDCCH** · 22 May 2009, 22:13

Originally posted by Platypus View Post

I want to transfer 7 terabytes of data over a 1Gb ethernet network using TCP/IP.

Am I right in thinking that theoretically:

1 Gigabits / second = 1024*1024*1024 = 1073741824 bits/second = 133,217,728 bytes per second

Real network capacity is approx 80% of this, so a maximum of 107,374,182 bytes per second can be transferred. This "raw data" figure includes frame overhead, IPv4 overhead and TCP headers (more on this later).

Ok so far?

My 7TB of data is made up of 7 million files in 5 million directories. Assuming I plan to use "tar" to roll it all up and squirt over the network ...

tar cf - mydir | rsh remotehost tar xf -

Then 7TB in 7 million files makes an average file size of 1MB which is 2048 data blocks (of 512 bytes each - standard tar block size). So the amount of tar data is 7 million files x 2048 blocks + 7 million file header blocks plus 5 million directory blocks = 14,348,000,000 blocks + 7 million extra blocks (say one extra block per file) = 14,355,000,000 blocks = 7.349 TB of data

OK so far?

Googling around it seems that about 95% of the traffic sent over TCP/IP is "payload" with the other 5% being frame overhead, IPv4 headers and TCP headers, so to send my 7.349TB of real data I'll actually need to send 7.736TB of raw data.

7.736TB of raw data at 107,374,182 bytes/second will take 72,000 (approx) seconds = 20 hours

Any networking bods care to comment and tell me if this looks about right?

Thanks!

Uhhhm...

Odd question, but why exactly are you trying to squeeze that much out of rsh and the stack?

Surely, there has to be more efficient ways to do this (fiber-channel or similar SANs spring immediately to mind as does direct mirroring and clustering).

tar & the berkeley stack portions are always going to be Heath Robinson in implementation (as applied). To take your example:

$ tar cf - mydir | rsh remotehost tar xf -

Assuming Berkeley / UCB tar, assuming STDIN of ./mydir piped to extracting on remote machine...

You're aware that this will essentially be Epic Fail as well as grossly inefficient? 7TB warrants a backup network, mirror or cluster.

Save yourself the hassle, get a second opinion from an HACMP / Clustering expert. Or, purchase decent backup software and a secondary fabric.

Alternately, install wget and make / readable by target uid:gid - and be removed from an ops capacity :-)

Cheers,

PDCCH

Network speed - sanity check please

Comment

Partners

Advertisers

Contractor Services

CUK News

Network speed - sanity check please

Comment

Partners

Advertisers

Contractor Services

CUK News

Tag Cloud