• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Network speed - sanity check please"

Collapse

  • PDCCH
    replied
    Originally posted by Platypus View Post
    I want to transfer 7 terabytes of data over a 1Gb ethernet network using TCP/IP.

    Am I right in thinking that theoretically:

    1 Gigabits / second = 1024*1024*1024 = 1073741824 bits/second = 133,217,728 bytes per second

    Real network capacity is approx 80% of this, so a maximum of 107,374,182 bytes per second can be transferred. This "raw data" figure includes frame overhead, IPv4 overhead and TCP headers (more on this later).

    Ok so far?

    My 7TB of data is made up of 7 million files in 5 million directories. Assuming I plan to use "tar" to roll it all up and squirt over the network ...

    tar cf - mydir | rsh remotehost tar xf -

    Then 7TB in 7 million files makes an average file size of 1MB which is 2048 data blocks (of 512 bytes each - standard tar block size). So the amount of tar data is 7 million files x 2048 blocks + 7 million file header blocks plus 5 million directory blocks = 14,348,000,000 blocks + 7 million extra blocks (say one extra block per file) = 14,355,000,000 blocks = 7.349 TB of data

    OK so far?

    Googling around it seems that about 95% of the traffic sent over TCP/IP is "payload" with the other 5% being frame overhead, IPv4 headers and TCP headers, so to send my 7.349TB of real data I'll actually need to send 7.736TB of raw data.

    7.736TB of raw data at 107,374,182 bytes/second will take 72,000 (approx) seconds = 20 hours

    Any networking bods care to comment and tell me if this looks about right?

    Thanks!
    Uhhhm...

    Odd question, but why exactly are you trying to squeeze that much out of rsh and the stack?

    Surely, there has to be more efficient ways to do this (fiber-channel or similar SANs spring immediately to mind as does direct mirroring and clustering).

    tar & the berkeley stack portions are always going to be Heath Robinson in implementation (as applied). To take your example:

    $ tar cf - mydir | rsh remotehost tar xf -

    Assuming Berkeley / UCB tar, assuming STDIN of ./mydir piped to extracting on remote machine...

    You're aware that this will essentially be Epic Fail as well as grossly inefficient? 7TB warrants a backup network, mirror or cluster.

    Save yourself the hassle, get a second opinion from an HACMP / Clustering expert. Or, purchase decent backup software and a secondary fabric.

    Alternately, install wget and make / readable by target uid:gid - and be removed from an ops capacity :-)

    Cheers,

    PDCCH

    Leave a comment:


  • vetran
    replied
    iperf - real network speed
    rsync - makes 7gb look like 100mb of changes.

    Leave a comment:


  • Platypus
    replied
    Originally posted by Moscow Mule View Post
    and milan...
    don't confuse my quality posts with the other fella's nonsense

    P.S. even with the tape drive, would take approx 10 hours

    Leave a comment:


  • Moscow Mule
    replied
    Just get one of these,

    http://www-03.ibm.com/systems/storag...130/index.html

    and milan...

    Much easier than fiddling with all that networky stuff

    Leave a comment:


  • Platypus
    replied
    Thanks everyone for the comments.

    I'm told that the network link between the two machines will be dedicated with no other traffic. As for other topology issues, I assume you mean whether the traffic will be routed, switched, whatever, then I just don't know.

    I take the point about compression, but I'm wondering if the CPU overhead of that might then become the bottleneck instead of the network. This would need to be tested beforehand to see if the compression will speed the process up or slow it down.

    I'm assured none of the files is > 4Gb, nevertheless, the point is that if the process halts or crashes unexpectedly, then recovery will be a manual hack not an automatic pick up where we left off. This aspect is the part that disturbs me the most.

    You're right that a test on a subset is the only way to get a real metric, but the client has asked for a theoretic "ballpark" number to qualify the situation. If the answer had been "1 week" then the methodology is fatally flawed for this situation. However, an answer of "1 day" means that it makes sense to explore this further.

    Finally, thanks for correcting the maths error!

    Cheers guys - very useful feedback

    Leave a comment:


  • jatinder
    replied
    You didn't say whether the files were compressed or not.

    I'd suggest compressing the output of tar as in:

    tar cf - mydir | gzip | rsh remotehost 'gzip | tar xf -'

    or something similar.

    --Jatinder

    Leave a comment:


  • bren586
    replied
    I have used this to move a database from one machine to another.

    2 things to watch out for:

    Lots of small files slooooows the whole thing down

    Tar will error on files larger than 4 GB and not transfer them. You need to trap the error(s) and move them manually - not forgetting to change the ownership and so on.

    Leave a comment:


  • Durbs
    replied
    Cant really be calculated like that as it totally depends on the topology, what kit is in place, what other traffic etc etc etc as has been said.

    You're realistically not going to get anywhere near the max speeds of a gigabit network as those are only theoretical.

    Best bet is to copy a subset of the data, time it and scale up accordingly.

    Leave a comment:


  • TheRefactornator
    replied
    Your numbers stack up except:

    7.736 terabytes == 8505821952475 bytes

    So the theoretical transfer time at 107374182 bytes/second is 22hrs.

    I'm not a networking type, this is just running the numbers.
    Scooterscot is right other networking factors may have an effect here. The other thing to consider is will performing a tar on the files introduce another overhead? I expect it will for millions of files but I don't know how to quantify that.

    Leave a comment:


  • scooterscot
    replied
    Any other users on the network during the transfer?

    Has the network topology been considered?

    Leave a comment:


  • Platypus
    started a topic Network speed - sanity check please

    Network speed - sanity check please

    I want to transfer 7 terabytes of data over a 1Gb ethernet network using TCP/IP.

    Am I right in thinking that theoretically:

    1 Gigabits / second = 1024*1024*1024 = 1073741824 bits/second = 133,217,728 bytes per second

    Real network capacity is approx 80% of this, so a maximum of 107,374,182 bytes per second can be transferred. This "raw data" figure includes frame overhead, IPv4 overhead and TCP headers (more on this later).

    Ok so far?

    My 7TB of data is made up of 7 million files in 5 million directories. Assuming I plan to use "tar" to roll it all up and squirt over the network ...

    tar cf - mydir | rsh remotehost tar xf -

    Then 7TB in 7 million files makes an average file size of 1MB which is 2048 data blocks (of 512 bytes each - standard tar block size). So the amount of tar data is 7 million files x 2048 blocks + 7 million file header blocks plus 5 million directory blocks = 14,348,000,000 blocks + 7 million extra blocks (say one extra block per file) = 14,355,000,000 blocks = 7.349 TB of data

    OK so far?

    Googling around it seems that about 95% of the traffic sent over TCP/IP is "payload" with the other 5% being frame overhead, IPv4 headers and TCP headers, so to send my 7.349TB of real data I'll actually need to send 7.736TB of raw data.

    7.736TB of raw data at 107,374,182 bytes/second will take 72,000 (approx) seconds = 20 hours

    Any networking bods care to comment and tell me if this looks about right?

    Thanks!
Working...
X