Comparing Two Files After File Copy - Performance Improvements?
I've built a file copying routine into a common library for a variety of different (WinForms) applications I'm currently working on. What I've built implements the commonly-used CopyFileEx
method to actually perform the file copy while displaying the progress, which seems to be working great.
The only real issue I'm encountering is that, because most of the file copying I'm doing is for archival purposes, once the file is copied, I would like to "verify" the new copy of the file. I have the following methods in place to do the comparison/verification. I'm sure many of you will quickly see where the "problem" is:
Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean
Dim Match As Boolean = False
If File1.FullName = File2.FullName Then
Match = True
Else
If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then
If File1.Length = File2.Length Then
If File1.LastWriteTime = File2.LastWriteTime Then
Try
Dim File1Hash As String = HashFileForComparison(File1)
Dim File2Hash As String = HashFileForComparison(File2)
If File1Hash = File2Hash Then
Match = True
End If
Catch ex As Exception
Dim CompareError As New ErrorHandler(ex)
CompareError.LogException()
End Try
End If
End If
End If
End If
Return Match
End Function
Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String
Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)
Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider
Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)
Return System.Text.Encoding.Unicode.GetString(FileHash)
End Using
End Using
End Function
This CompareFiles()
method checks a few of the "simple" elements first:
- Is it trying to compare a file to itself? (if so, always return
True
) - Do both files actually exist?
- Are the two files the same size?
- Do they both have the same modification date?
But, you guessed it, here's where the performance takes the hit. Especially for large files, the MD5.ComputeHash
method of the HashFileForComparison()
method can take a while - about 1.25 minutes for a 500MB file for a total of about 2.5 minutes to compute both hashes for the comparison. Does anyone have a better suggestion for how to more efficiently verify the new copy of the file?
vb.net performance md5
|
show 5 more comments
I've built a file copying routine into a common library for a variety of different (WinForms) applications I'm currently working on. What I've built implements the commonly-used CopyFileEx
method to actually perform the file copy while displaying the progress, which seems to be working great.
The only real issue I'm encountering is that, because most of the file copying I'm doing is for archival purposes, once the file is copied, I would like to "verify" the new copy of the file. I have the following methods in place to do the comparison/verification. I'm sure many of you will quickly see where the "problem" is:
Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean
Dim Match As Boolean = False
If File1.FullName = File2.FullName Then
Match = True
Else
If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then
If File1.Length = File2.Length Then
If File1.LastWriteTime = File2.LastWriteTime Then
Try
Dim File1Hash As String = HashFileForComparison(File1)
Dim File2Hash As String = HashFileForComparison(File2)
If File1Hash = File2Hash Then
Match = True
End If
Catch ex As Exception
Dim CompareError As New ErrorHandler(ex)
CompareError.LogException()
End Try
End If
End If
End If
End If
Return Match
End Function
Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String
Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)
Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider
Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)
Return System.Text.Encoding.Unicode.GetString(FileHash)
End Using
End Using
End Function
This CompareFiles()
method checks a few of the "simple" elements first:
- Is it trying to compare a file to itself? (if so, always return
True
) - Do both files actually exist?
- Are the two files the same size?
- Do they both have the same modification date?
But, you guessed it, here's where the performance takes the hit. Especially for large files, the MD5.ComputeHash
method of the HashFileForComparison()
method can take a while - about 1.25 minutes for a 500MB file for a total of about 2.5 minutes to compute both hashes for the comparison. Does anyone have a better suggestion for how to more efficiently verify the new copy of the file?
vb.net performance md5
1
Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting yourIf
statements. And lastly, writingIf ‹condition› then Variable = True
is an anti-pattern. Just writeVariable = ‹condition›
.
– Konrad Rudolph
Nov 13 '18 at 17:14
To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.
– G_Hosa_Phat
Nov 13 '18 at 17:20
2
One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.
– Konrad Rudolph
Nov 13 '18 at 17:29
I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…
– G_Hosa_Phat
Nov 13 '18 at 20:45
I may be missing something here, but your functionHashFileForComparison
has the codeUsing BufferedFileReader ..
, but you don't useBufferedFileReader
you refer toFileReader
in your sample - is this a typo in the pasted sample or a mistake in your code?
– David Wilson
Nov 15 '18 at 11:29
|
show 5 more comments
I've built a file copying routine into a common library for a variety of different (WinForms) applications I'm currently working on. What I've built implements the commonly-used CopyFileEx
method to actually perform the file copy while displaying the progress, which seems to be working great.
The only real issue I'm encountering is that, because most of the file copying I'm doing is for archival purposes, once the file is copied, I would like to "verify" the new copy of the file. I have the following methods in place to do the comparison/verification. I'm sure many of you will quickly see where the "problem" is:
Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean
Dim Match As Boolean = False
If File1.FullName = File2.FullName Then
Match = True
Else
If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then
If File1.Length = File2.Length Then
If File1.LastWriteTime = File2.LastWriteTime Then
Try
Dim File1Hash As String = HashFileForComparison(File1)
Dim File2Hash As String = HashFileForComparison(File2)
If File1Hash = File2Hash Then
Match = True
End If
Catch ex As Exception
Dim CompareError As New ErrorHandler(ex)
CompareError.LogException()
End Try
End If
End If
End If
End If
Return Match
End Function
Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String
Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)
Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider
Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)
Return System.Text.Encoding.Unicode.GetString(FileHash)
End Using
End Using
End Function
This CompareFiles()
method checks a few of the "simple" elements first:
- Is it trying to compare a file to itself? (if so, always return
True
) - Do both files actually exist?
- Are the two files the same size?
- Do they both have the same modification date?
But, you guessed it, here's where the performance takes the hit. Especially for large files, the MD5.ComputeHash
method of the HashFileForComparison()
method can take a while - about 1.25 minutes for a 500MB file for a total of about 2.5 minutes to compute both hashes for the comparison. Does anyone have a better suggestion for how to more efficiently verify the new copy of the file?
vb.net performance md5
I've built a file copying routine into a common library for a variety of different (WinForms) applications I'm currently working on. What I've built implements the commonly-used CopyFileEx
method to actually perform the file copy while displaying the progress, which seems to be working great.
The only real issue I'm encountering is that, because most of the file copying I'm doing is for archival purposes, once the file is copied, I would like to "verify" the new copy of the file. I have the following methods in place to do the comparison/verification. I'm sure many of you will quickly see where the "problem" is:
Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean
Dim Match As Boolean = False
If File1.FullName = File2.FullName Then
Match = True
Else
If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then
If File1.Length = File2.Length Then
If File1.LastWriteTime = File2.LastWriteTime Then
Try
Dim File1Hash As String = HashFileForComparison(File1)
Dim File2Hash As String = HashFileForComparison(File2)
If File1Hash = File2Hash Then
Match = True
End If
Catch ex As Exception
Dim CompareError As New ErrorHandler(ex)
CompareError.LogException()
End Try
End If
End If
End If
End If
Return Match
End Function
Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String
Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)
Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider
Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)
Return System.Text.Encoding.Unicode.GetString(FileHash)
End Using
End Using
End Function
This CompareFiles()
method checks a few of the "simple" elements first:
- Is it trying to compare a file to itself? (if so, always return
True
) - Do both files actually exist?
- Are the two files the same size?
- Do they both have the same modification date?
But, you guessed it, here's where the performance takes the hit. Especially for large files, the MD5.ComputeHash
method of the HashFileForComparison()
method can take a while - about 1.25 minutes for a 500MB file for a total of about 2.5 minutes to compute both hashes for the comparison. Does anyone have a better suggestion for how to more efficiently verify the new copy of the file?
vb.net performance md5
vb.net performance md5
edited Nov 15 '18 at 14:29
G_Hosa_Phat
asked Nov 13 '18 at 17:12
G_Hosa_PhatG_Hosa_Phat
287416
287416
1
Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting yourIf
statements. And lastly, writingIf ‹condition› then Variable = True
is an anti-pattern. Just writeVariable = ‹condition›
.
– Konrad Rudolph
Nov 13 '18 at 17:14
To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.
– G_Hosa_Phat
Nov 13 '18 at 17:20
2
One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.
– Konrad Rudolph
Nov 13 '18 at 17:29
I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…
– G_Hosa_Phat
Nov 13 '18 at 20:45
I may be missing something here, but your functionHashFileForComparison
has the codeUsing BufferedFileReader ..
, but you don't useBufferedFileReader
you refer toFileReader
in your sample - is this a typo in the pasted sample or a mistake in your code?
– David Wilson
Nov 15 '18 at 11:29
|
show 5 more comments
1
Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting yourIf
statements. And lastly, writingIf ‹condition› then Variable = True
is an anti-pattern. Just writeVariable = ‹condition›
.
– Konrad Rudolph
Nov 13 '18 at 17:14
To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.
– G_Hosa_Phat
Nov 13 '18 at 17:20
2
One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.
– Konrad Rudolph
Nov 13 '18 at 17:29
I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…
– G_Hosa_Phat
Nov 13 '18 at 20:45
I may be missing something here, but your functionHashFileForComparison
has the codeUsing BufferedFileReader ..
, but you don't useBufferedFileReader
you refer toFileReader
in your sample - is this a typo in the pasted sample or a mistake in your code?
– David Wilson
Nov 15 '18 at 11:29
1
1
Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your
If
statements. And lastly, writing If ‹condition› then Variable = True
is an anti-pattern. Just write Variable = ‹condition›
.– Konrad Rudolph
Nov 13 '18 at 17:14
Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your
If
statements. And lastly, writing If ‹condition› then Variable = True
is an anti-pattern. Just write Variable = ‹condition›
.– Konrad Rudolph
Nov 13 '18 at 17:14
To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.
– G_Hosa_Phat
Nov 13 '18 at 17:20
To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.
– G_Hosa_Phat
Nov 13 '18 at 17:20
2
2
One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.
– Konrad Rudolph
Nov 13 '18 at 17:29
One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.
– Konrad Rudolph
Nov 13 '18 at 17:29
I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…
– G_Hosa_Phat
Nov 13 '18 at 20:45
I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…
– G_Hosa_Phat
Nov 13 '18 at 20:45
I may be missing something here, but your function
HashFileForComparison
has the code Using BufferedFileReader ..
, but you don't use BufferedFileReader
you refer to FileReader
in your sample - is this a typo in the pasted sample or a mistake in your code?– David Wilson
Nov 15 '18 at 11:29
I may be missing something here, but your function
HashFileForComparison
has the code Using BufferedFileReader ..
, but you don't use BufferedFileReader
you refer to FileReader
in your sample - is this a typo in the pasted sample or a mistake in your code?– David Wilson
Nov 15 '18 at 11:29
|
show 5 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286296%2fcomparing-two-files-after-file-copy-performance-improvements%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286296%2fcomparing-two-files-after-file-copy-performance-improvements%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your
If
statements. And lastly, writingIf ‹condition› then Variable = True
is an anti-pattern. Just writeVariable = ‹condition›
.– Konrad Rudolph
Nov 13 '18 at 17:14
To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.
– G_Hosa_Phat
Nov 13 '18 at 17:20
2
One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.
– Konrad Rudolph
Nov 13 '18 at 17:29
I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…
– G_Hosa_Phat
Nov 13 '18 at 20:45
I may be missing something here, but your function
HashFileForComparison
has the codeUsing BufferedFileReader ..
, but you don't useBufferedFileReader
you refer toFileReader
in your sample - is this a typo in the pasted sample or a mistake in your code?– David Wilson
Nov 15 '18 at 11:29