Comparing Two Files After File Copy - Performance Improvements?

I've built a file copying routine into a common library for a variety of different (WinForms) applications I'm currently working on. What I've built implements the commonly-used CopyFileEx method to actually perform the file copy while displaying the progress, which seems to be working great.

The only real issue I'm encountering is that, because most of the file copying I'm doing is for archival purposes, once the file is copied, I would like to "verify" the new copy of the file. I have the following methods in place to do the comparison/verification. I'm sure many of you will quickly see where the "problem" is:

Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean

    Dim Match As Boolean = False



    If File1.FullName = File2.FullName Then

        Match = True

    Else

        If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then

            If File1.Length = File2.Length Then

                If File1.LastWriteTime = File2.LastWriteTime Then

                    Try

                        Dim File1Hash As String = HashFileForComparison(File1)

                        Dim File2Hash As String = HashFileForComparison(File2)



                        If File1Hash = File2Hash Then

                            Match = True

                        End If

                    Catch ex As Exception

                        Dim CompareError As New ErrorHandler(ex)



                        CompareError.LogException()

                    End Try

                End If

            End If

        End If

    End If



    Return Match

End Function



Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String

    Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)

        Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider

            Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)



            Return System.Text.Encoding.Unicode.GetString(FileHash)

        End Using

    End Using

End Function

This CompareFiles() method checks a few of the "simple" elements first:

Is it trying to compare a file to itself? (if so, always return True)

Do both files actually exist?

Are the two files the same size?

Do they both have the same modification date?

But, you guessed it, here's where the performance takes the hit. Especially for large files, the MD5.ComputeHash method of the HashFileForComparison() method can take a while - about 1.25 minutes for a 500MB file for a total of about 2.5 minutes to compute both hashes for the comparison. Does anyone have a better suggestion for how to more efficiently verify the new copy of the file?

edited Nov 15 '18 at 14:29

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

1

Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your If statements. And lastly, writing If ‹condition› then Variable = True is an anti-pattern. Just write Variable = ‹condition›.

– Konrad Rudolph
Nov 13 '18 at 17:14

To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.

– G_Hosa_Phat
Nov 13 '18 at 17:20

2

One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.

– Konrad Rudolph
Nov 13 '18 at 17:29

I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…

– G_Hosa_Phat
Nov 13 '18 at 20:45

I may be missing something here, but your function HashFileForComparison has the code Using BufferedFileReader .., but you don't use BufferedFileReader you refer to FileReader in your sample - is this a typo in the pasted sample or a mistake in your code?

– David Wilson
Nov 15 '18 at 11:29

|
show 5 more comments

Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean

    Dim Match As Boolean = False



    If File1.FullName = File2.FullName Then

        Match = True

    Else

        If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then

            If File1.Length = File2.Length Then

                If File1.LastWriteTime = File2.LastWriteTime Then

                    Try

                        Dim File1Hash As String = HashFileForComparison(File1)

                        Dim File2Hash As String = HashFileForComparison(File2)



                        If File1Hash = File2Hash Then

                            Match = True

                        End If

                    Catch ex As Exception

                        Dim CompareError As New ErrorHandler(ex)



                        CompareError.LogException()

                    End Try

                End If

            End If

        End If

    End If



    Return Match

End Function



Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String

    Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)

        Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider

            Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)



            Return System.Text.Encoding.Unicode.GetString(FileHash)

        End Using

    End Using

End Function

This CompareFiles() method checks a few of the "simple" elements first:

Is it trying to compare a file to itself? (if so, always return True)

Do both files actually exist?

Are the two files the same size?

Do they both have the same modification date?

edited Nov 15 '18 at 14:29

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

1

Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your If statements. And lastly, writing If ‹condition› then Variable = True is an anti-pattern. Just write Variable = ‹condition›.

– Konrad Rudolph
Nov 13 '18 at 17:14

To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.

– G_Hosa_Phat
Nov 13 '18 at 17:20

2

One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.

– Konrad Rudolph
Nov 13 '18 at 17:29

I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…

– G_Hosa_Phat
Nov 13 '18 at 20:45

I may be missing something here, but your function HashFileForComparison has the code Using BufferedFileReader .., but you don't use BufferedFileReader you refer to FileReader in your sample - is this a typo in the pasted sample or a mistake in your code?

– David Wilson
Nov 15 '18 at 11:29

|
show 5 more comments

Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean

    Dim Match As Boolean = False



    If File1.FullName = File2.FullName Then

        Match = True

    Else

        If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then

            If File1.Length = File2.Length Then

                If File1.LastWriteTime = File2.LastWriteTime Then

                    Try

                        Dim File1Hash As String = HashFileForComparison(File1)

                        Dim File2Hash As String = HashFileForComparison(File2)



                        If File1Hash = File2Hash Then

                            Match = True

                        End If

                    Catch ex As Exception

                        Dim CompareError As New ErrorHandler(ex)



                        CompareError.LogException()

                    End Try

                End If

            End If

        End If

    End If



    Return Match

End Function



Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String

    Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)

        Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider

            Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)



            Return System.Text.Encoding.Unicode.GetString(FileHash)

        End Using

    End Using

End Function

This CompareFiles() method checks a few of the "simple" elements first:

Is it trying to compare a file to itself? (if so, always return True)

Do both files actually exist?

Are the two files the same size?

Do they both have the same modification date?

edited Nov 15 '18 at 14:29

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean

    Dim Match As Boolean = False



    If File1.FullName = File2.FullName Then

        Match = True

    Else

        If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then

            If File1.Length = File2.Length Then

                If File1.LastWriteTime = File2.LastWriteTime Then

                    Try

                        Dim File1Hash As String = HashFileForComparison(File1)

                        Dim File2Hash As String = HashFileForComparison(File2)



                        If File1Hash = File2Hash Then

                            Match = True

                        End If

                    Catch ex As Exception

                        Dim CompareError As New ErrorHandler(ex)



                        CompareError.LogException()

                    End Try

                End If

            End If

        End If

    End If



    Return Match

End Function



Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String

    Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)

        Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider

            Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)



            Return System.Text.Encoding.Unicode.GetString(FileHash)

        End Using

    End Using

End Function

This CompareFiles() method checks a few of the "simple" elements first:

Is it trying to compare a file to itself? (if so, always return True)

Do both files actually exist?

Are the two files the same size?

Do they both have the same modification date?

vb.net performance md5

edited Nov 15 '18 at 14:29

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

edited Nov 15 '18 at 14:29

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

edited Nov 15 '18 at 14:29

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

asked Nov 13 '18 at 17:12

G_Hosa_Phat

287416

1

Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your If statements. And lastly, writing If ‹condition› then Variable = True is an anti-pattern. Just write Variable = ‹condition›.

– Konrad Rudolph
Nov 13 '18 at 17:14

To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.

– G_Hosa_Phat
Nov 13 '18 at 17:20

2

One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.

– Konrad Rudolph
Nov 13 '18 at 17:29

I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…

– G_Hosa_Phat
Nov 13 '18 at 20:45

I may be missing something here, but your function HashFileForComparison has the code Using BufferedFileReader .., but you don't use BufferedFileReader you refer to FileReader in your sample - is this a typo in the pasted sample or a mistake in your code?

– David Wilson
Nov 15 '18 at 11:29

|
show 5 more comments

1

Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your If statements. And lastly, writing If ‹condition› then Variable = True is an anti-pattern. Just write Variable = ‹condition›.

– Konrad Rudolph
Nov 13 '18 at 17:14

To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.

– G_Hosa_Phat
Nov 13 '18 at 17:20

2

One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.

– Konrad Rudolph
Nov 13 '18 at 17:29

I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…

– G_Hosa_Phat
Nov 13 '18 at 20:45

I may be missing something here, but your function HashFileForComparison has the code Using BufferedFileReader .., but you don't use BufferedFileReader you refer to FileReader in your sample - is this a typo in the pasted sample or a mistake in your code?

– David Wilson
Nov 15 '18 at 11:29

Why not compare the files directly instead of first computing hashes and then comparing those? For only two files, you’re not saving any work. It also makes no sense to convert the MD5 hash to a string before comparing it. Apart from that you can make your code more readable by rewriting it to exit the function as soon as possible, rather than nesting your If statements. And lastly, writing If ‹condition› then Variable = True is an anti-pattern. Just write Variable = ‹condition›.

– Konrad Rudolph
Nov 13 '18 at 17:14

To be clear, is your suggestion to eliminate the hash comparison entirely? My main intention is to do my best to ensure that the "archive" copy is accurate in case I need to restore it later. Comparing the file size and modification date should generally indicate that the files are "the same", but I want to leave out as much room for error as possible.

– G_Hosa_Phat
Nov 13 '18 at 17:20

One of my suggestions is to remove the hash comparison, yes. It’s a detour. Just open both files and compare their contents in chunks. That way you avoid the (somewhat costly) hash computation.

– Konrad Rudolph
Nov 13 '18 at 17:29

I've tried using a couple of the methods suggested in the thread below but it seems that my current MD5 implementation is actually outperforming them. Even when I "tweak" the buffer sizes and such, the hash method I'm using is still a few seconds (or more) faster with this 500MB file. stackoverflow.com/questions/1358510/…

– G_Hosa_Phat
Nov 13 '18 at 20:45

I may be missing something here, but your function HashFileForComparison has the code Using BufferedFileReader .., but you don't use BufferedFileReader you refer to FileReader in your sample - is this a typo in the pasted sample or a mistake in your code?

– David Wilson
Nov 15 '18 at 11:29

|
show 5 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286296%2fcomparing-two-files-after-file-copy-performance-improvements%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky