• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Network speed - sanity check please

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    #11
    Originally posted by Platypus View Post
    I want to transfer 7 terabytes of data over a 1Gb ethernet network using TCP/IP.

    Am I right in thinking that theoretically:

    1 Gigabits / second = 1024*1024*1024 = 1073741824 bits/second = 133,217,728 bytes per second

    Real network capacity is approx 80% of this, so a maximum of 107,374,182 bytes per second can be transferred. This "raw data" figure includes frame overhead, IPv4 overhead and TCP headers (more on this later).

    Ok so far?

    My 7TB of data is made up of 7 million files in 5 million directories. Assuming I plan to use "tar" to roll it all up and squirt over the network ...

    tar cf - mydir | rsh remotehost tar xf -

    Then 7TB in 7 million files makes an average file size of 1MB which is 2048 data blocks (of 512 bytes each - standard tar block size). So the amount of tar data is 7 million files x 2048 blocks + 7 million file header blocks plus 5 million directory blocks = 14,348,000,000 blocks + 7 million extra blocks (say one extra block per file) = 14,355,000,000 blocks = 7.349 TB of data

    OK so far?

    Googling around it seems that about 95% of the traffic sent over TCP/IP is "payload" with the other 5% being frame overhead, IPv4 headers and TCP headers, so to send my 7.349TB of real data I'll actually need to send 7.736TB of raw data.

    7.736TB of raw data at 107,374,182 bytes/second will take 72,000 (approx) seconds = 20 hours

    Any networking bods care to comment and tell me if this looks about right?

    Thanks!
    Uhhhm...

    Odd question, but why exactly are you trying to squeeze that much out of rsh and the stack?

    Surely, there has to be more efficient ways to do this (fiber-channel or similar SANs spring immediately to mind as does direct mirroring and clustering).

    tar & the berkeley stack portions are always going to be Heath Robinson in implementation (as applied). To take your example:

    $ tar cf - mydir | rsh remotehost tar xf -

    Assuming Berkeley / UCB tar, assuming STDIN of ./mydir piped to extracting on remote machine...

    You're aware that this will essentially be Epic Fail as well as grossly inefficient? 7TB warrants a backup network, mirror or cluster.

    Save yourself the hassle, get a second opinion from an HACMP / Clustering expert. Or, purchase decent backup software and a secondary fabric.

    Alternately, install wget and make / readable by target uid:gid - and be removed from an ops capacity :-)

    Cheers,

    PDCCH

    Comment

    Working...
    X