Pandas categorical variable with missing data












1















Suppose that I have this dataframe:



dfdic = {"col1": ['azul', 'amarillo', 'amarillo', np.nan], "col2": [4, 5, 8, 10]}
df = pd.DataFrame(dfdic)


I want to convert the col1 field to dummy variables. I can do that by:



pd.get_dummies(df, columns=['col1']).head()


which gives



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 0 0


The NaN in col1 has been replaced by two zeroes in the dummy variables. This makes sense because it is saying that the instance does not belong to any of the categories. However, how can I replace those zeroes by NaNs, so I could have



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 NaN NaN









share|improve this question

























  • If df2 is your df with dummies, df2[df2["col2"].isna()] = np.nan?

    – Evan
    Nov 15 '18 at 22:01
















1















Suppose that I have this dataframe:



dfdic = {"col1": ['azul', 'amarillo', 'amarillo', np.nan], "col2": [4, 5, 8, 10]}
df = pd.DataFrame(dfdic)


I want to convert the col1 field to dummy variables. I can do that by:



pd.get_dummies(df, columns=['col1']).head()


which gives



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 0 0


The NaN in col1 has been replaced by two zeroes in the dummy variables. This makes sense because it is saying that the instance does not belong to any of the categories. However, how can I replace those zeroes by NaNs, so I could have



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 NaN NaN









share|improve this question

























  • If df2 is your df with dummies, df2[df2["col2"].isna()] = np.nan?

    – Evan
    Nov 15 '18 at 22:01














1












1








1








Suppose that I have this dataframe:



dfdic = {"col1": ['azul', 'amarillo', 'amarillo', np.nan], "col2": [4, 5, 8, 10]}
df = pd.DataFrame(dfdic)


I want to convert the col1 field to dummy variables. I can do that by:



pd.get_dummies(df, columns=['col1']).head()


which gives



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 0 0


The NaN in col1 has been replaced by two zeroes in the dummy variables. This makes sense because it is saying that the instance does not belong to any of the categories. However, how can I replace those zeroes by NaNs, so I could have



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 NaN NaN









share|improve this question
















Suppose that I have this dataframe:



dfdic = {"col1": ['azul', 'amarillo', 'amarillo', np.nan], "col2": [4, 5, 8, 10]}
df = pd.DataFrame(dfdic)


I want to convert the col1 field to dummy variables. I can do that by:



pd.get_dummies(df, columns=['col1']).head()


which gives



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 0 0


The NaN in col1 has been replaced by two zeroes in the dummy variables. This makes sense because it is saying that the instance does not belong to any of the categories. However, how can I replace those zeroes by NaNs, so I could have



    col2    col1_amarillo   col1_azul
0 4.0 0 1
1 5.0 1 0
2 8.0 1 0
3 10 NaN NaN






python pandas missing-data






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 22:30







Vladimir Vargas

















asked Nov 15 '18 at 21:54









Vladimir VargasVladimir Vargas

483621




483621













  • If df2 is your df with dummies, df2[df2["col2"].isna()] = np.nan?

    – Evan
    Nov 15 '18 at 22:01



















  • If df2 is your df with dummies, df2[df2["col2"].isna()] = np.nan?

    – Evan
    Nov 15 '18 at 22:01

















If df2 is your df with dummies, df2[df2["col2"].isna()] = np.nan?

– Evan
Nov 15 '18 at 22:01





If df2 is your df with dummies, df2[df2["col2"].isna()] = np.nan?

– Evan
Nov 15 '18 at 22:01












1 Answer
1






active

oldest

votes


















0















mask + isnull



You can use mask to make selected columns null dependent on another series.



df.iloc[:, 1:] = df.iloc[:, 1:].mask(df['col2'].isnull())

print(df)

col2 col1_amarillo col1_azul
0 4.0 0.0 1.0
1 5.0 1.0 0.0
2 8.0 1.0 0.0
3 NaN NaN NaN





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53328432%2fpandas-categorical-variable-with-missing-data%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0















    mask + isnull



    You can use mask to make selected columns null dependent on another series.



    df.iloc[:, 1:] = df.iloc[:, 1:].mask(df['col2'].isnull())

    print(df)

    col2 col1_amarillo col1_azul
    0 4.0 0.0 1.0
    1 5.0 1.0 0.0
    2 8.0 1.0 0.0
    3 NaN NaN NaN





    share|improve this answer




























      0















      mask + isnull



      You can use mask to make selected columns null dependent on another series.



      df.iloc[:, 1:] = df.iloc[:, 1:].mask(df['col2'].isnull())

      print(df)

      col2 col1_amarillo col1_azul
      0 4.0 0.0 1.0
      1 5.0 1.0 0.0
      2 8.0 1.0 0.0
      3 NaN NaN NaN





      share|improve this answer


























        0












        0








        0








        mask + isnull



        You can use mask to make selected columns null dependent on another series.



        df.iloc[:, 1:] = df.iloc[:, 1:].mask(df['col2'].isnull())

        print(df)

        col2 col1_amarillo col1_azul
        0 4.0 0.0 1.0
        1 5.0 1.0 0.0
        2 8.0 1.0 0.0
        3 NaN NaN NaN





        share|improve this answer














        mask + isnull



        You can use mask to make selected columns null dependent on another series.



        df.iloc[:, 1:] = df.iloc[:, 1:].mask(df['col2'].isnull())

        print(df)

        col2 col1_amarillo col1_azul
        0 4.0 0.0 1.0
        1 5.0 1.0 0.0
        2 8.0 1.0 0.0
        3 NaN NaN NaN






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 15 '18 at 22:01









        jppjpp

        102k2165115




        102k2165115
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53328432%2fpandas-categorical-variable-with-missing-data%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Danny Elfman

            Lugert, Oklahoma