How to check if panda dataframe group have same data

I have a pandas dataframe as below

id  name  Base   field1    field2           field3

1   AA     Y      Yes      Consumer         Not Applicable 

1   BB     N      Yes      Consumer         Not Applicable 

2   CC     Y      Yes      Consumer         Not Applicable 

2   DD     N      Yes      Not Applicable   Not Applicable 

2   EE     N      No       Not Applicable   Modified

3   FF     Y      Yes      Not Applicable   Applicable 

3   GG     N      Yes      Not Applicable   Not Applicable 

3   HH     N      Yes      Not Applicable   Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=

for col in df.columns:

        result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)

        result_list.append(result)



final = pd.concat(result_list,1)

The expected result is

id  name     field1   field2           field3           Error

1   AA       Yes      Consumer         Not Applicable   Pass 

1   BB       Yes      Consumer         Not Applicable   Pass

2   CC       Yes      Consumer         Not Applicable   field1, field2, field3 mismatch for ID: 2

2   DD       Yes      Not Applicable   Not Applicable   field1, field2, field3 mismatch for ID: 2

2   EE       No       Not Applicable   Modified         field1, field2, field3 mismatch for ID: 2

3   FF       Yes      Not Applicable   Applicable       field3 mismatch for ID: 3

3   GG       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

3   HH       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

Any help on this?

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

I have a pandas dataframe as below

id  name  Base   field1    field2           field3

1   AA     Y      Yes      Consumer         Not Applicable 

1   BB     N      Yes      Consumer         Not Applicable 

2   CC     Y      Yes      Consumer         Not Applicable 

2   DD     N      Yes      Not Applicable   Not Applicable 

2   EE     N      No       Not Applicable   Modified

3   FF     Y      Yes      Not Applicable   Applicable 

3   GG     N      Yes      Not Applicable   Not Applicable 

3   HH     N      Yes      Not Applicable   Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=

for col in df.columns:

        result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)

        result_list.append(result)



final = pd.concat(result_list,1)

The expected result is

id  name     field1   field2           field3           Error

1   AA       Yes      Consumer         Not Applicable   Pass 

1   BB       Yes      Consumer         Not Applicable   Pass

2   CC       Yes      Consumer         Not Applicable   field1, field2, field3 mismatch for ID: 2

2   DD       Yes      Not Applicable   Not Applicable   field1, field2, field3 mismatch for ID: 2

2   EE       No       Not Applicable   Modified         field1, field2, field3 mismatch for ID: 2

3   FF       Yes      Not Applicable   Applicable       field3 mismatch for ID: 3

3   GG       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

3   HH       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

Any help on this?

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

I have a pandas dataframe as below

id  name  Base   field1    field2           field3

1   AA     Y      Yes      Consumer         Not Applicable 

1   BB     N      Yes      Consumer         Not Applicable 

2   CC     Y      Yes      Consumer         Not Applicable 

2   DD     N      Yes      Not Applicable   Not Applicable 

2   EE     N      No       Not Applicable   Modified

3   FF     Y      Yes      Not Applicable   Applicable 

3   GG     N      Yes      Not Applicable   Not Applicable 

3   HH     N      Yes      Not Applicable   Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=

for col in df.columns:

        result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)

        result_list.append(result)



final = pd.concat(result_list,1)

The expected result is

id  name     field1   field2           field3           Error

1   AA       Yes      Consumer         Not Applicable   Pass 

1   BB       Yes      Consumer         Not Applicable   Pass

2   CC       Yes      Consumer         Not Applicable   field1, field2, field3 mismatch for ID: 2

2   DD       Yes      Not Applicable   Not Applicable   field1, field2, field3 mismatch for ID: 2

2   EE       No       Not Applicable   Modified         field1, field2, field3 mismatch for ID: 2

3   FF       Yes      Not Applicable   Applicable       field3 mismatch for ID: 3

3   GG       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

3   HH       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

Any help on this?

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

I have a pandas dataframe as below

id  name  Base   field1    field2           field3

1   AA     Y      Yes      Consumer         Not Applicable 

1   BB     N      Yes      Consumer         Not Applicable 

2   CC     Y      Yes      Consumer         Not Applicable 

2   DD     N      Yes      Not Applicable   Not Applicable 

2   EE     N      No       Not Applicable   Modified

3   FF     Y      Yes      Not Applicable   Applicable 

3   GG     N      Yes      Not Applicable   Not Applicable 

3   HH     N      Yes      Not Applicable   Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=

for col in df.columns:

        result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)

        result_list.append(result)



final = pd.concat(result_list,1)

The expected result is

id  name     field1   field2           field3           Error

1   AA       Yes      Consumer         Not Applicable   Pass 

1   BB       Yes      Consumer         Not Applicable   Pass

2   CC       Yes      Consumer         Not Applicable   field1, field2, field3 mismatch for ID: 2

2   DD       Yes      Not Applicable   Not Applicable   field1, field2, field3 mismatch for ID: 2

2   EE       No       Not Applicable   Modified         field1, field2, field3 mismatch for ID: 2

3   FF       Yes      Not Applicable   Applicable       field3 mismatch for ID: 3

3   GG       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

3   HH       Yes      Not Applicable   Not Applicable   field3 mismatch for ID: 3

Any help on this?

python-3.x pandas dataframe pandas-groupby

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

asked Nov 13 '18 at 12:52

Osceria

599

asked Nov 13 '18 at 12:52

Osceria

599

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

2 Answers
2

active

oldest

votes

You may get what you want with the code (assuming that df has index named id):

def handler(df):

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            return 'error in {} for id {}'.format(col, df.index[0])

    else:

        return 'pass'



result = df.groupby(level=0).apply(handler)

result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

   id name field1          field2          field3                         0

0   1   AA    Yes        Consumer  Not Applicable                      pass

1   1   BB    Yes        Consumer  Not Applicable                      pass

2   2   CC    Yes        Consumer  Not Applicable  error in field1 for id 2

3   2   DD    Yes  Not Applicable  Not Applicable  error in field1 for id 2

4   2   EE     No  Not Applicable        Modified  error in field1 for id 2

5   3   FF    Yes  Not Applicable      Applicable  error in field3 for id 3

6   3   GG    Yes  Not Applicable  Not Applicable  error in field3 for id 3

7   3   HH    Yes  Not Applicable  Not Applicable  error in field3 for id 3

EDIT - minor editions in handler

def handler(df):

    cols = list()

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            cols.append(col)

    if cols:

        return 'error in {} for id {}'.format(', '.join(cols), df.index[0])

    else:

        return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5011211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

    field1  field2  field3

id          

1   False   False   False

2   True    True    True

3   False   False   True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281433%2fhow-to-check-if-panda-dataframe-group-have-same-data%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You may get what you want with the code (assuming that df has index named id):

def handler(df):

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            return 'error in {} for id {}'.format(col, df.index[0])

    else:

        return 'pass'



result = df.groupby(level=0).apply(handler)

result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

   id name field1          field2          field3                         0

0   1   AA    Yes        Consumer  Not Applicable                      pass

1   1   BB    Yes        Consumer  Not Applicable                      pass

2   2   CC    Yes        Consumer  Not Applicable  error in field1 for id 2

3   2   DD    Yes  Not Applicable  Not Applicable  error in field1 for id 2

4   2   EE     No  Not Applicable        Modified  error in field1 for id 2

5   3   FF    Yes  Not Applicable      Applicable  error in field3 for id 3

6   3   GG    Yes  Not Applicable  Not Applicable  error in field3 for id 3

7   3   HH    Yes  Not Applicable  Not Applicable  error in field3 for id 3

EDIT - minor editions in handler

def handler(df):

    cols = list()

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            cols.append(col)

    if cols:

        return 'error in {} for id {}'.format(', '.join(cols), df.index[0])

    else:

        return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5011211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You may get what you want with the code (assuming that df has index named id):

def handler(df):

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            return 'error in {} for id {}'.format(col, df.index[0])

    else:

        return 'pass'



result = df.groupby(level=0).apply(handler)

result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

   id name field1          field2          field3                         0

0   1   AA    Yes        Consumer  Not Applicable                      pass

1   1   BB    Yes        Consumer  Not Applicable                      pass

2   2   CC    Yes        Consumer  Not Applicable  error in field1 for id 2

3   2   DD    Yes  Not Applicable  Not Applicable  error in field1 for id 2

4   2   EE     No  Not Applicable        Modified  error in field1 for id 2

5   3   FF    Yes  Not Applicable      Applicable  error in field3 for id 3

6   3   GG    Yes  Not Applicable  Not Applicable  error in field3 for id 3

7   3   HH    Yes  Not Applicable  Not Applicable  error in field3 for id 3

EDIT - minor editions in handler

def handler(df):

    cols = list()

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            cols.append(col)

    if cols:

        return 'error in {} for id {}'.format(', '.join(cols), df.index[0])

    else:

        return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5011211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You may get what you want with the code (assuming that df has index named id):

def handler(df):

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            return 'error in {} for id {}'.format(col, df.index[0])

    else:

        return 'pass'



result = df.groupby(level=0).apply(handler)

result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

   id name field1          field2          field3                         0

0   1   AA    Yes        Consumer  Not Applicable                      pass

1   1   BB    Yes        Consumer  Not Applicable                      pass

2   2   CC    Yes        Consumer  Not Applicable  error in field1 for id 2

3   2   DD    Yes  Not Applicable  Not Applicable  error in field1 for id 2

4   2   EE     No  Not Applicable        Modified  error in field1 for id 2

5   3   FF    Yes  Not Applicable      Applicable  error in field3 for id 3

6   3   GG    Yes  Not Applicable  Not Applicable  error in field3 for id 3

7   3   HH    Yes  Not Applicable  Not Applicable  error in field3 for id 3

EDIT - minor editions in handler

def handler(df):

    cols = list()

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            cols.append(col)

    if cols:

        return 'error in {} for id {}'.format(', '.join(cols), df.index[0])

    else:

        return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5011211

You may get what you want with the code (assuming that df has index named id):

def handler(df):

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            return 'error in {} for id {}'.format(col, df.index[0])

    else:

        return 'pass'



result = df.groupby(level=0).apply(handler)

result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

   id name field1          field2          field3                         0

0   1   AA    Yes        Consumer  Not Applicable                      pass

1   1   BB    Yes        Consumer  Not Applicable                      pass

2   2   CC    Yes        Consumer  Not Applicable  error in field1 for id 2

3   2   DD    Yes  Not Applicable  Not Applicable  error in field1 for id 2

4   2   EE     No  Not Applicable        Modified  error in field1 for id 2

5   3   FF    Yes  Not Applicable      Applicable  error in field3 for id 3

6   3   GG    Yes  Not Applicable  Not Applicable  error in field3 for id 3

7   3   HH    Yes  Not Applicable  Not Applicable  error in field3 for id 3

EDIT - minor editions in handler

def handler(df):

    cols = list()

    for col in ['field1', 'field2', 'field3']:

        if df.loc[:, col].nunique() > 1:

            cols.append(col)

    if cols:

        return 'error in {} for id {}'.format(', '.join(cols), df.index[0])

    else:

        return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5011211

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5011211

answered Nov 13 '18 at 14:52

Poolka

1,5011211

answered Nov 13 '18 at 14:52

Poolka

1,5011211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

    field1  field2  field3

id          

1   False   False   False

2   True    True    True

3   False   False   True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

    field1  field2  field3

id          

1   False   False   False

2   True    True    True

3   False   False   True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

    field1  field2  field3

id          

1   False   False   False

2   True    True    True

3   False   False   True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

    field1  field2  field3

id          

1   False   False   False

2   True    True    True

3   False   False   True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

answered Nov 13 '18 at 14:53

Franco Piccolo

1,576712

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Bwo dGIAIgrkbBX6,aMR8 vVjCjHD2,OalaD9RJLQ7q9PyaqcFQPyM1c2o6

搜尋此網誌

Ndtyjky