Welcome to our community

Be a part of something great, join today!

  • Hey all, just changed over the backend after 15 years I figured time to give it a bit of an update, its probably gonna be a bit weird for most of you and i am sure there is a few bugs to work out but it should kinda work the same as before... hopefully :)

Copy & md5 Shell Script

I am working on a new copy tool written from scratch in C/C++. It is faster and has a lot more features. It has a command line tool and a full GUI version. It runs on OSX, Windows, and Linux. Should have a test version available soon if anyone is interested. This will not be a free tool like the shell script.


Dusty
 
I would love to try it out! Not completely happy with any current copy tools. Cross-platform is great. Things I would like to have:
Simple text file with checksums, can open with QuickLook.
Accurate and detailed real time progress display, showing each drive separately.
Remember destinations (so sound cards and camera mags each go to their respective directories.)
Trigger shell or AppleScript upon completion.

Dusty, I'm curious what you think of storing checksums as extended attributes instead of a text file. That's what YoYotta ID is doing. The idea scares me because copying between different file systems might lose them.
 
I would love this and gladly pay for it too! I paid for Double Data and have been less than impressed. I've actually switched to rsync and have been using that as my primary way of downloading. Really looking forward to using your shell script and seeing your GUI as well (fro the less technical people out there).
 
Definitely interested in testing your new software. Currently I use my own shell scripts, which are somewhat derived from yours. I find that better for my way of working than any of the current GUI apps. But am always keen to try out alternatives.
 
I am looking for people who are interested in helping test the new copy tool.

Right now we just have command line tool on OSX (10.8 and newer) and Windows (only test Windows 8).

Looking for technical people who are comfortable with command line.

I really need Windows users to test. I HATE Windows and have hated it for almost 20 years. I think it is the biggest piece of shit OS that exists. That said, I want to support Windows. No other copy tools support Windows and there are a lot of Windows users out there. Just because I do not like Windows does not mean people do not use it.

Right now I am dual booting my a 2009 Mac Pro between OSX 10.8 and Windows 8. The software runs much faster on OSX for me, but we are actually developing it on Windows and making tweeks for OSX as needed, so I expected it to run better on Windows. Hopefully it just does not like my test system.

If you are interested in testing the command line version of the tool, PM or email me and I will get you a copy.

I would not use the tool on any important footage or jobs. We have not found any problems, but it has not had enough testing for use on paying jobs yet.


P.S. - if anyone really wants to test the Linux version I can get you that. We made Linux a low priority.



Thanks,
Dusty@sandust.com
 
Thought I would share the current features of the new command line tool:

Version: setcopy v0.9.95
(c) 2014, Sandust

Options:
--help Print usage and exit
--version Print the version and exit
-v [ --verbose ] Execute with verbose output
-e [ --error ] Introduce an error into the output file for testing purposes (USE WITH CAUTION!)
-f [ --force ] Force setcopy to overwrite the destination
-r [ --rename ] arg Tell setcopy to rename the target folder
-s [ --source ] arg The source directory to copy from (requires quotes around source directory)
-t [ --target ] arg The target directory to copy to (requires quotes around target directory)
-i [ --target1 ] arg The intermediate target directory to copy to first (requires quotes around target directory)
-h [ --hash ] arg (none|crc32+|md5|md5+) The hash function used to verify the copy (default: md5)
-l [ --log ] arg Generate log files in an additional location (requires quotes around log file directory)
-u [ --user ] arg The name of the user (requires quotes around user name)
-o [ --owner ] arg The name of the production company (requires quotes around production company)
-p [ --project ] arg The Name or Project (requires quotes around the project name)

--------------------------------------------------------
setcopy examples
--------------------------------------------------------

setcopy -v -h md5+ -s "/path/to/source/" -t "/path/to/destination1" -t "/path/to/destination2"

This will copy the /path/to/source/ folder and all files within to both /path/to/destination1 and /path/to/destination2.
This will perform an md5 checksum on the source and on the destination files
This will also print verbose information to the screen
This will write a log file and a source and destination checksum file to each destination


setcopy -v -h none -s "/path/to/source/" -t "/path/to/destination1" -u "Dusty" -o "Sandust" -p "setcopy"

This will copy the /path/to/source/ folder and all files within to /path/to/destination1
This will perform no checksums
This will print verbose information to the screen
This will write a log file to the destination
In the log file it will list Dusty as Data Manager, Sandust as the Production Company,and setcopy as the Project


setcopy -v -h md5+ -s "/path/to/source/NO NAME/" -t "/path/to/destination1" -r "Canon5d_001"

This will copy the /path/to/source/NO NAME/ folder and all files within to /path/to/destination1
This will perform an md5 checksum on the source and on the destination files
This will print verbose information to the screen
This will write a log file to the destination
This will rename NO NAME to Canon5d_001 on all destinations

setcopy -v -h md5+ -s "/path/to/source/" -i "/path/to/destination1" -t "/path/to/destination2"

This will copy the /path/to/source/ folder and all files within to /path/to/destination1
Then setcopy will copy from /path/to/destination1 to /path/to/destination2
It will perform an md5 checksum on the source and on both destination files
It will also print verbose information to the screen
It will write a log file and a source and destination checksum file to each destination
This can be useful if you have a fast raid for destination 1 and slower drive for destination 2

--------------------------------------------------------
NOTES
--------------------------------------------------------

I recommend always using the -v option. If you do not use the -v option setcopy will not print to screen any errors or mismatched checksums

-none does no checksums (super fast)
-md5 only checksums the source files (very fast, but you do not know you have a perfect copy on the destination)
-md5+ does checksum on source and destination
-crc32+ does checksum on source and destination

Setcopy will NOT overwrite files unless you use the -f or --force option. Instead it will add a number to the end of the folder and increment that number every time.

Testing versions of setcopy will stop working five days after compile.

********************************************************
--ERROR option
********************************************************

The -e or --error option is for testing only.
this will cause setcopy to randomly corrupt footage as it writes to the destinations to test the checksums and error handling.
This will not change the source material, but it will corrupt the copies.
This option will be removed in the final version

********************************************************



Dusty
 
Thought I would share a little of what I have learned from creating data copy tools. Most of it is common sense, but it is still good knowledge to have when planning for drives and data management.

--------------------------------------------------------
DATA COPY THEORY
--------------------------------------------------------

Speed of setcopy is determined by three things.

1) read speed of the source
2) write speed of the slowest destination
3) speed of cpu to do checksums

If the source can only read at 20MB/s and all the destinations can write at 500MB/s. The copy will never go faster than 20MB/s

In a normal parallel copy mode, if the source can read at 1000MB/s and one destination can write at 1000MB/s, but the second destination can only write at 20MB/s, the copy will never go faster than 20MB/s

setcopy is designed to use CPUs as best it can. There are some limitations though. md5 checksum can not use multiple cpus for a single checksum. So when doing an md5 checksum, the speed of your cpu will also limit the overall speed. On a 2009 Mac Pro with 2.26GHz CPU the fastest it can do md5 is roughly 330MB/s. So no matter how fast the source and destination disks can read and write, this system can never go faster than about 330MB/s if doing MD5 checksums.

Normally setcopy does one read of the source and writes to all destinations at the same time. This is why a slow destination will slow everything down. Setcopy has the option to use an intermediate destination to speed up the first copy so you can get media back to set quickly without the slow media effecting the speed much. This will do a very fast copy between the 1000MB/s source and the 1000MB/s destination. Then a 20MB/s copy from destination 1 to destination 2. If your media is slow to read or your CPU can not do checksums very fast, there may not be any advantage to using an intermediate destination.

Because CPU has so much impact on copy speeds when doing checksums, we are working on an implementation of crc32 that can thread across multiple CPUs and will hopefully go much faster than md5. Right now it is faster on Windows and okay on OSX. We are also considering other checksum algorithms that are even faster.


Hopefully that information is useful to people.



Thanks,
Dusty
 
I read your post about setcopy. Looking forward to try it! I'm primarily a mac user, and am well-versed in the command line, although I still think a GUI would be more failsafe for this kind of thing. But anyhow, I'll gladly try it out and give feedback. I have a couple of notes, just based on reading the instructions you posted:

You say "This version of setcopy will stop working on Feb 19 2014 23:38:29." Really?? That's kind of soon.

I have never had a need to use sha1, but it would still be a good option to have.

Would you explain the order of operations? e.g. source checksums simultaneously with the first copy? Or source checksums and destination checksums calculated simultaneously, after the first copy?

Why do you require quotes? Can you make that optional, so that if I know there are no weird characters in filenames, I can save a few keystrokes? And are multiple instances of md5 run simultaneously to use multiple threads?

I would love to see some kind of rsync-ish mode.
--For example: I have a 2nd unit DP shooting B-roll on his DSLR or GoPro, and he won't let me format the CF card because he wants to keep the footage. He periodically dumps footage on my system throughout the day to show the director. I sometimes use rsync in this situation, but it doesn't write a nice log file, and it doesn't use MD5.


Aloha,

Just had someone ask me some questions about the new command line tool and I thought I would share my answers.

The current testing versions will die tonight sometime, but I should have new builds up before the end of the day. We are making the test builds die every five days so people don't use an old build for testing.

I expect everyone will use the GUI and the GUI will have a ton more features. Command line was just an easy way for us to test all the different algorithms we could think of without spending extra time on the GUI for each tested feature.

-----------------------
SHA
-----------------------

sha is so overkill it is a complete waste of time for what we do. Even md5 is WAY overkill. md5 and sha were created so you could put a file out on the wild Internet where hackers could mess with it. Then provide users with a checksum of the file so they could verify if hackers had changed the original file in some way. It has been determined that hackers can modify a file and still have it produce an md5 checksum that matches with a lot of work. This is why people say md5 is not cryptographically secure.

We copy files from one drive to another. There is never a point where a hacker could corrupt our files on set. We just need to know that file A on drive 1 is an exact copy of file A on drive 2. That is what crc32 was designed for and is perfect for. Most tools do not give you the crc32 hash, so everyone started using md5 in the beginning. Now md5 has become our standard, but it is WAY overkill. We really need to get back to crc32 checksums.

I want a tool that is safe and fast. SHA or any of the other cryptographically secure algorithms are not something I am even interested in.


-----------------------
Order of Operations
-----------------------

-h none = does no checksums. Just does one read on source and multiple writes from that one read

-h md5 = does one read on source. From that read, it does the "source checksum" and writes to multiple destinations

-h md5+ = does one read on source. From that read, it does the "source checksum" and writes to multiple destinations simultaneously. After it writes to a destination, it does a new read for the "destination checksum".

-h crc32+ = does one read on source. From that read, it does the "source checksum" and writes to multiple destinations simultaneously. After it writes to a destination, it does a new read for the "destination checksum".

Source checksums are create as we read the file on the source media.

Destination checksums can not be created until after the data is written. The data must then be read again to get an accurate destination checksum.

All operations are done as parallel as possible. Anything that can be sped up in a new thread is.

If you add an intermediate destination, follow these same basic rules. One things about using intermediate and just md5. Since intermediate does a completely new read on the intermediate destination, you get a complete source and destination1 checksum, but no destination checksums for other destinations.


-----------------------
quotes required
-----------------------

I say quotes are required because most users do not understand special characters and spaces cause problems for command line. If users are in the habit of using quotes those problems go away. The software works just fine without quotes.


-s "/path/to/this folder/" = will work

-s /path/to/this folder/ = will NOT work (NO SPACES IN PATH)

-s /path/to/this_folder/ = will work

-s/path/to/this_folder/ = will work

--source/path/to/this_folder/ = will work

also

-u "Dustin Cross" = will work and will write Dustin Cross in the Log

-u Dustin Cross = will just write Dustin in the Log

So I only tell users the safest way to do things. If you know what you are doing and what to do other things, that is fine, but for tech support I will ask if you used quotes.


-----------------------
RSYNC mode
-----------------------

What do you mean when you say rsync mode? rsync has a lot of features. Do you mean only copy the new files? That feature will be in the GUI version. I guess it would be easy to add that feature to the command line tool, but I really do not expect anyone to use the command line tool once the GUI is available.



Mahalo,
Dusty
 
Aloha,

I am still looking for Windows testers. I know there are lots of people out there working on Windows.

I guess I should have said it, but any testers who do testing and give feedback will get a free license when the final tool is released.

I could use some more OSX testers too.


Thanks,
Dusty
 
I agree SHA is overkill, but MD5 is important. As time goes on, we are backing up more and more data, and the probability of an incorrect hash increases. I don't have the exact math on top of my head, but there is a small possibility of a bad copy showing a correct crc32 hash.


Yes, about rsync I meant only copying new files.
 
I have been occasionally having this error show up, but then it seems to continue and work fine. Any input?
/Users/resolve/Desktop/Screen Shot 2014-02-24 at 6.07.57 PM.png

FIRST COPY 100% COMPLETE
du: /Volumes/AUTO_NATION_BACKUP2/DAY1/A_CAM/A005_02244P.RDM: No such file or directory
(standard_in) 1: parse error
(standard_in) 1: parse error
SECOND and THIRD COPY % COMPLETE/Applications/CopyVerified.app/Contents/Resources/CopyVerified.sh: line 549: [: -le: unary operator expected
SOURCE CHECKSUMS 100% COMPLETE
 
Yeah, I typed the error message below the screenshot in case it didn't show.
 
I have been occasionally having this error show up, but then it seems to continue and work fine. Any input?
/Users/resolve/Desktop/Screen Shot 2014-02-24 at 6.07.57 PM.png

FIRST COPY 100% COMPLETE
du: /Volumes/AUTO_NATION_BACKUP2/DAY1/A_CAM/A005_02244P.RDM: No such file or directory
(standard_in) 1: parse error
(standard_in) 1: parse error
SECOND and THIRD COPY % COMPLETE/Applications/CopyVerified.app/Contents/Resources/CopyVerified.sh: line 549: [: -le: unary operator expected
SOURCE CHECKSUMS 100% COMPLETE

Matt,

That is a problem with the timing of how I get percent complete. It tries to measure how much of a file is done before the file has started. I have not figured out why it happens some times.


Dusty
 
Thanks Dustin. It should work fine though right? It's been working great besides that one issue.
 
Just saw your latest version email. I will test soon. But it reminded me of another feature request: With the GUI, will you have a tool to verify a directory by reading the md5sum files without copying any files? (To be used at post house in the future, for example)
 
Back
Top