How to Segment handwritten and printed digit without losing information in opencv?












2















I've written an algorithm that would detect printed and handwritten digit and segment it but while removing outer rectangle handwritten digit is lost using clear_border from ski-image package. Any suggestion to prevent information.



Sample:
enter image description here



How to get all 5 characters separately?










share|improve this question




















  • 1





    If I understand your question, you have two problems, 1) is bottom part of digits could be cropped, 2) is how to segment digits (objects) from BW images.. right ?

    – Y.AL
    Oct 31 '18 at 9:34











  • yes, you are right.

    – Zara
    Nov 13 '18 at 6:48
















2















I've written an algorithm that would detect printed and handwritten digit and segment it but while removing outer rectangle handwritten digit is lost using clear_border from ski-image package. Any suggestion to prevent information.



Sample:
enter image description here



How to get all 5 characters separately?










share|improve this question




















  • 1





    If I understand your question, you have two problems, 1) is bottom part of digits could be cropped, 2) is how to segment digits (objects) from BW images.. right ?

    – Y.AL
    Oct 31 '18 at 9:34











  • yes, you are right.

    – Zara
    Nov 13 '18 at 6:48














2












2








2








I've written an algorithm that would detect printed and handwritten digit and segment it but while removing outer rectangle handwritten digit is lost using clear_border from ski-image package. Any suggestion to prevent information.



Sample:
enter image description here



How to get all 5 characters separately?










share|improve this question
















I've written an algorithm that would detect printed and handwritten digit and segment it but while removing outer rectangle handwritten digit is lost using clear_border from ski-image package. Any suggestion to prevent information.



Sample:
enter image description here



How to get all 5 characters separately?







python-3.x opencv image-processing computer-vision digits






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 16 at 12:58







Zara

















asked Oct 25 '18 at 18:11









ZaraZara

5210




5210








  • 1





    If I understand your question, you have two problems, 1) is bottom part of digits could be cropped, 2) is how to segment digits (objects) from BW images.. right ?

    – Y.AL
    Oct 31 '18 at 9:34











  • yes, you are right.

    – Zara
    Nov 13 '18 at 6:48














  • 1





    If I understand your question, you have two problems, 1) is bottom part of digits could be cropped, 2) is how to segment digits (objects) from BW images.. right ?

    – Y.AL
    Oct 31 '18 at 9:34











  • yes, you are right.

    – Zara
    Nov 13 '18 at 6:48








1




1





If I understand your question, you have two problems, 1) is bottom part of digits could be cropped, 2) is how to segment digits (objects) from BW images.. right ?

– Y.AL
Oct 31 '18 at 9:34





If I understand your question, you have two problems, 1) is bottom part of digits could be cropped, 2) is how to segment digits (objects) from BW images.. right ?

– Y.AL
Oct 31 '18 at 9:34













yes, you are right.

– Zara
Nov 13 '18 at 6:48





yes, you are right.

– Zara
Nov 13 '18 at 6:48












2 Answers
2






active

oldest

votes


















4





+25









Segmenting characters from the image -



Approach -




  1. Threshold the image (Convert it to BW)

  2. Perform Dilation

  3. Check the contours are large enough

  4. Find rectangular Contours

  5. Take ROI and save the characters


Python Code -



# import the necessary packages
import numpy as np
import cv2
import imutils

# load the image, convert it to grayscale, and blur it to remove noise
image = cv2.imread("sample1.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (7, 7), 0)

# threshold the image
ret,thresh1 = cv2.threshold(gray ,127,255,cv2.THRESH_BINARY_INV)

# dilate the white portions
dilate = cv2.dilate(thresh1, None, iterations=2)

# find contours in the image
cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if imutils.is_cv2() else cnts[1]

orig = image.copy()
i = 0

for cnt in cnts:
# Check the area of contour, if it is very small ignore it
if(cv2.contourArea(cnt) < 100):
continue

# Filtered countours are detected
x,y,w,h = cv2.boundingRect(cnt)

# Taking ROI of the cotour
roi = image[y:y+h, x:x+w]

# Mark them on the image if you want
cv2.rectangle(orig,(x,y),(x+w,y+h),(0,255,0),2)

# Save your contours or characters
cv2.imwrite("roi" + str(i) + ".png", roi)

i = i + 1

cv2.imshow("Image", orig)
cv2.waitKey(0)


First of all I thresholded the image to convert it to black n white. I get characters in white portion of image and background as black. Then I Dilated the image to make the characters (white portions) thick, this will make it easy to find the appropriate contours. Then find findContours method is used to find the contours. Then we need to check that the contour is large enough, if the contour is not large enough then it is ignored ( because that contour is noise ). Then boundingRect method is used to find the rectangle for the contour. And finally, the detected contours are saved and drawn.



Input Image -



Input



Threshold -



Thresh



Dilated -



Dilate



Contours -



Contours



Saved characters -



char2char0char1char3






share|improve this answer


























  • Can you please remove images from your answer because of privacy issues?

    – Zara
    Nov 13 '18 at 6:48






  • 1





    Okay i will replace those images with different ones

    – Devashish Prasad
    Nov 14 '18 at 6:44



















2














Problem of eroded/cropped handwritten digits:
you may solve this problem in the recognition step, or even in image improvement step (before recognition).




  • if only a very small part of digit is cropped (such your image example), it's enough to pad your image around by 1 or 2 pixels to make the segmentation process easy. Or some morpho filter (dilate) can improve your digit even after padding. (these solution are available in Opencv)

  • if a enough good part of digit is cropped, you need to add a degraded/cropped pattern of digits to the training Dataset used for digit recognition algorithm, (i.e. digit 3 with all possible cropping cases.. etc)


Problem of characters separation :




  • opencv offers blob detection algorithm that works well on your issue (choose the correct value for concave & convexity params)


  • opencv offers as well contour detector (canny() function), that helps to detect the contours of your character then you can find the fitted bounding (offered by Opencv as well : cv2.approxPolyDP(contour,..,..)) box around each character







share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52995607%2fhow-to-segment-handwritten-and-printed-digit-without-losing-information-in-openc%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4





    +25









    Segmenting characters from the image -



    Approach -




    1. Threshold the image (Convert it to BW)

    2. Perform Dilation

    3. Check the contours are large enough

    4. Find rectangular Contours

    5. Take ROI and save the characters


    Python Code -



    # import the necessary packages
    import numpy as np
    import cv2
    import imutils

    # load the image, convert it to grayscale, and blur it to remove noise
    image = cv2.imread("sample1.jpg")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.GaussianBlur(gray, (7, 7), 0)

    # threshold the image
    ret,thresh1 = cv2.threshold(gray ,127,255,cv2.THRESH_BINARY_INV)

    # dilate the white portions
    dilate = cv2.dilate(thresh1, None, iterations=2)

    # find contours in the image
    cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if imutils.is_cv2() else cnts[1]

    orig = image.copy()
    i = 0

    for cnt in cnts:
    # Check the area of contour, if it is very small ignore it
    if(cv2.contourArea(cnt) < 100):
    continue

    # Filtered countours are detected
    x,y,w,h = cv2.boundingRect(cnt)

    # Taking ROI of the cotour
    roi = image[y:y+h, x:x+w]

    # Mark them on the image if you want
    cv2.rectangle(orig,(x,y),(x+w,y+h),(0,255,0),2)

    # Save your contours or characters
    cv2.imwrite("roi" + str(i) + ".png", roi)

    i = i + 1

    cv2.imshow("Image", orig)
    cv2.waitKey(0)


    First of all I thresholded the image to convert it to black n white. I get characters in white portion of image and background as black. Then I Dilated the image to make the characters (white portions) thick, this will make it easy to find the appropriate contours. Then find findContours method is used to find the contours. Then we need to check that the contour is large enough, if the contour is not large enough then it is ignored ( because that contour is noise ). Then boundingRect method is used to find the rectangle for the contour. And finally, the detected contours are saved and drawn.



    Input Image -



    Input



    Threshold -



    Thresh



    Dilated -



    Dilate



    Contours -



    Contours



    Saved characters -



    char2char0char1char3






    share|improve this answer


























    • Can you please remove images from your answer because of privacy issues?

      – Zara
      Nov 13 '18 at 6:48






    • 1





      Okay i will replace those images with different ones

      – Devashish Prasad
      Nov 14 '18 at 6:44
















    4





    +25









    Segmenting characters from the image -



    Approach -




    1. Threshold the image (Convert it to BW)

    2. Perform Dilation

    3. Check the contours are large enough

    4. Find rectangular Contours

    5. Take ROI and save the characters


    Python Code -



    # import the necessary packages
    import numpy as np
    import cv2
    import imutils

    # load the image, convert it to grayscale, and blur it to remove noise
    image = cv2.imread("sample1.jpg")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.GaussianBlur(gray, (7, 7), 0)

    # threshold the image
    ret,thresh1 = cv2.threshold(gray ,127,255,cv2.THRESH_BINARY_INV)

    # dilate the white portions
    dilate = cv2.dilate(thresh1, None, iterations=2)

    # find contours in the image
    cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if imutils.is_cv2() else cnts[1]

    orig = image.copy()
    i = 0

    for cnt in cnts:
    # Check the area of contour, if it is very small ignore it
    if(cv2.contourArea(cnt) < 100):
    continue

    # Filtered countours are detected
    x,y,w,h = cv2.boundingRect(cnt)

    # Taking ROI of the cotour
    roi = image[y:y+h, x:x+w]

    # Mark them on the image if you want
    cv2.rectangle(orig,(x,y),(x+w,y+h),(0,255,0),2)

    # Save your contours or characters
    cv2.imwrite("roi" + str(i) + ".png", roi)

    i = i + 1

    cv2.imshow("Image", orig)
    cv2.waitKey(0)


    First of all I thresholded the image to convert it to black n white. I get characters in white portion of image and background as black. Then I Dilated the image to make the characters (white portions) thick, this will make it easy to find the appropriate contours. Then find findContours method is used to find the contours. Then we need to check that the contour is large enough, if the contour is not large enough then it is ignored ( because that contour is noise ). Then boundingRect method is used to find the rectangle for the contour. And finally, the detected contours are saved and drawn.



    Input Image -



    Input



    Threshold -



    Thresh



    Dilated -



    Dilate



    Contours -



    Contours



    Saved characters -



    char2char0char1char3






    share|improve this answer


























    • Can you please remove images from your answer because of privacy issues?

      – Zara
      Nov 13 '18 at 6:48






    • 1





      Okay i will replace those images with different ones

      – Devashish Prasad
      Nov 14 '18 at 6:44














    4





    +25







    4





    +25



    4




    +25





    Segmenting characters from the image -



    Approach -




    1. Threshold the image (Convert it to BW)

    2. Perform Dilation

    3. Check the contours are large enough

    4. Find rectangular Contours

    5. Take ROI and save the characters


    Python Code -



    # import the necessary packages
    import numpy as np
    import cv2
    import imutils

    # load the image, convert it to grayscale, and blur it to remove noise
    image = cv2.imread("sample1.jpg")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.GaussianBlur(gray, (7, 7), 0)

    # threshold the image
    ret,thresh1 = cv2.threshold(gray ,127,255,cv2.THRESH_BINARY_INV)

    # dilate the white portions
    dilate = cv2.dilate(thresh1, None, iterations=2)

    # find contours in the image
    cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if imutils.is_cv2() else cnts[1]

    orig = image.copy()
    i = 0

    for cnt in cnts:
    # Check the area of contour, if it is very small ignore it
    if(cv2.contourArea(cnt) < 100):
    continue

    # Filtered countours are detected
    x,y,w,h = cv2.boundingRect(cnt)

    # Taking ROI of the cotour
    roi = image[y:y+h, x:x+w]

    # Mark them on the image if you want
    cv2.rectangle(orig,(x,y),(x+w,y+h),(0,255,0),2)

    # Save your contours or characters
    cv2.imwrite("roi" + str(i) + ".png", roi)

    i = i + 1

    cv2.imshow("Image", orig)
    cv2.waitKey(0)


    First of all I thresholded the image to convert it to black n white. I get characters in white portion of image and background as black. Then I Dilated the image to make the characters (white portions) thick, this will make it easy to find the appropriate contours. Then find findContours method is used to find the contours. Then we need to check that the contour is large enough, if the contour is not large enough then it is ignored ( because that contour is noise ). Then boundingRect method is used to find the rectangle for the contour. And finally, the detected contours are saved and drawn.



    Input Image -



    Input



    Threshold -



    Thresh



    Dilated -



    Dilate



    Contours -



    Contours



    Saved characters -



    char2char0char1char3






    share|improve this answer















    Segmenting characters from the image -



    Approach -




    1. Threshold the image (Convert it to BW)

    2. Perform Dilation

    3. Check the contours are large enough

    4. Find rectangular Contours

    5. Take ROI and save the characters


    Python Code -



    # import the necessary packages
    import numpy as np
    import cv2
    import imutils

    # load the image, convert it to grayscale, and blur it to remove noise
    image = cv2.imread("sample1.jpg")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.GaussianBlur(gray, (7, 7), 0)

    # threshold the image
    ret,thresh1 = cv2.threshold(gray ,127,255,cv2.THRESH_BINARY_INV)

    # dilate the white portions
    dilate = cv2.dilate(thresh1, None, iterations=2)

    # find contours in the image
    cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if imutils.is_cv2() else cnts[1]

    orig = image.copy()
    i = 0

    for cnt in cnts:
    # Check the area of contour, if it is very small ignore it
    if(cv2.contourArea(cnt) < 100):
    continue

    # Filtered countours are detected
    x,y,w,h = cv2.boundingRect(cnt)

    # Taking ROI of the cotour
    roi = image[y:y+h, x:x+w]

    # Mark them on the image if you want
    cv2.rectangle(orig,(x,y),(x+w,y+h),(0,255,0),2)

    # Save your contours or characters
    cv2.imwrite("roi" + str(i) + ".png", roi)

    i = i + 1

    cv2.imshow("Image", orig)
    cv2.waitKey(0)


    First of all I thresholded the image to convert it to black n white. I get characters in white portion of image and background as black. Then I Dilated the image to make the characters (white portions) thick, this will make it easy to find the appropriate contours. Then find findContours method is used to find the contours. Then we need to check that the contour is large enough, if the contour is not large enough then it is ignored ( because that contour is noise ). Then boundingRect method is used to find the rectangle for the contour. And finally, the detected contours are saved and drawn.



    Input Image -



    Input



    Threshold -



    Thresh



    Dilated -



    Dilate



    Contours -



    Contours



    Saved characters -



    char2char0char1char3







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 14 '18 at 7:08

























    answered Nov 1 '18 at 5:28









    Devashish PrasadDevashish Prasad

    416314




    416314













    • Can you please remove images from your answer because of privacy issues?

      – Zara
      Nov 13 '18 at 6:48






    • 1





      Okay i will replace those images with different ones

      – Devashish Prasad
      Nov 14 '18 at 6:44



















    • Can you please remove images from your answer because of privacy issues?

      – Zara
      Nov 13 '18 at 6:48






    • 1





      Okay i will replace those images with different ones

      – Devashish Prasad
      Nov 14 '18 at 6:44

















    Can you please remove images from your answer because of privacy issues?

    – Zara
    Nov 13 '18 at 6:48





    Can you please remove images from your answer because of privacy issues?

    – Zara
    Nov 13 '18 at 6:48




    1




    1





    Okay i will replace those images with different ones

    – Devashish Prasad
    Nov 14 '18 at 6:44





    Okay i will replace those images with different ones

    – Devashish Prasad
    Nov 14 '18 at 6:44













    2














    Problem of eroded/cropped handwritten digits:
    you may solve this problem in the recognition step, or even in image improvement step (before recognition).




    • if only a very small part of digit is cropped (such your image example), it's enough to pad your image around by 1 or 2 pixels to make the segmentation process easy. Or some morpho filter (dilate) can improve your digit even after padding. (these solution are available in Opencv)

    • if a enough good part of digit is cropped, you need to add a degraded/cropped pattern of digits to the training Dataset used for digit recognition algorithm, (i.e. digit 3 with all possible cropping cases.. etc)


    Problem of characters separation :




    • opencv offers blob detection algorithm that works well on your issue (choose the correct value for concave & convexity params)


    • opencv offers as well contour detector (canny() function), that helps to detect the contours of your character then you can find the fitted bounding (offered by Opencv as well : cv2.approxPolyDP(contour,..,..)) box around each character







    share|improve this answer




























      2














      Problem of eroded/cropped handwritten digits:
      you may solve this problem in the recognition step, or even in image improvement step (before recognition).




      • if only a very small part of digit is cropped (such your image example), it's enough to pad your image around by 1 or 2 pixels to make the segmentation process easy. Or some morpho filter (dilate) can improve your digit even after padding. (these solution are available in Opencv)

      • if a enough good part of digit is cropped, you need to add a degraded/cropped pattern of digits to the training Dataset used for digit recognition algorithm, (i.e. digit 3 with all possible cropping cases.. etc)


      Problem of characters separation :




      • opencv offers blob detection algorithm that works well on your issue (choose the correct value for concave & convexity params)


      • opencv offers as well contour detector (canny() function), that helps to detect the contours of your character then you can find the fitted bounding (offered by Opencv as well : cv2.approxPolyDP(contour,..,..)) box around each character







      share|improve this answer


























        2












        2








        2







        Problem of eroded/cropped handwritten digits:
        you may solve this problem in the recognition step, or even in image improvement step (before recognition).




        • if only a very small part of digit is cropped (such your image example), it's enough to pad your image around by 1 or 2 pixels to make the segmentation process easy. Or some morpho filter (dilate) can improve your digit even after padding. (these solution are available in Opencv)

        • if a enough good part of digit is cropped, you need to add a degraded/cropped pattern of digits to the training Dataset used for digit recognition algorithm, (i.e. digit 3 with all possible cropping cases.. etc)


        Problem of characters separation :




        • opencv offers blob detection algorithm that works well on your issue (choose the correct value for concave & convexity params)


        • opencv offers as well contour detector (canny() function), that helps to detect the contours of your character then you can find the fitted bounding (offered by Opencv as well : cv2.approxPolyDP(contour,..,..)) box around each character







        share|improve this answer













        Problem of eroded/cropped handwritten digits:
        you may solve this problem in the recognition step, or even in image improvement step (before recognition).




        • if only a very small part of digit is cropped (such your image example), it's enough to pad your image around by 1 or 2 pixels to make the segmentation process easy. Or some morpho filter (dilate) can improve your digit even after padding. (these solution are available in Opencv)

        • if a enough good part of digit is cropped, you need to add a degraded/cropped pattern of digits to the training Dataset used for digit recognition algorithm, (i.e. digit 3 with all possible cropping cases.. etc)


        Problem of characters separation :




        • opencv offers blob detection algorithm that works well on your issue (choose the correct value for concave & convexity params)


        • opencv offers as well contour detector (canny() function), that helps to detect the contours of your character then you can find the fitted bounding (offered by Opencv as well : cv2.approxPolyDP(contour,..,..)) box around each character








        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Oct 31 '18 at 10:33









        Y.ALY.AL

        1,364922




        1,364922






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52995607%2fhow-to-segment-handwritten-and-printed-digit-without-losing-information-in-openc%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            The Sandy Post

            Danny Elfman

            Pages that link to "Head v. Amoskeag Manufacturing Co."