Merge Dataframes on basis of coordinates having no common columns
INPUT :
df1
Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5
df2
Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1
Expected OUTPUT :
Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1
I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.
For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.
My Approach :
I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.
for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1
df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C
Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds
to run df1 = 20,000 rows and df2 = 3500 rows.
python pandas dataframe geometry
add a comment |
INPUT :
df1
Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5
df2
Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1
Expected OUTPUT :
Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1
I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.
For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.
My Approach :
I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.
for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1
df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C
Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds
to run df1 = 20,000 rows and df2 = 3500 rows.
python pandas dataframe geometry
Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?
– Michael
Nov 12 '18 at 15:03
add a comment |
INPUT :
df1
Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5
df2
Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1
Expected OUTPUT :
Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1
I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.
For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.
My Approach :
I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.
for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1
df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C
Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds
to run df1 = 20,000 rows and df2 = 3500 rows.
python pandas dataframe geometry
INPUT :
df1
Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5
df2
Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1
Expected OUTPUT :
Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1
I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.
For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.
My Approach :
I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.
for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1
df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C
Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds
to run df1 = 20,000 rows and df2 = 3500 rows.
python pandas dataframe geometry
python pandas dataframe geometry
edited Nov 13 '18 at 10:08
asked Nov 12 '18 at 10:40
Mahendra Singh
405
405
Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?
– Michael
Nov 12 '18 at 15:03
add a comment |
Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?
– Michael
Nov 12 '18 at 15:03
Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?
– Michael
Nov 12 '18 at 15:03
Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?
– Michael
Nov 12 '18 at 15:03
add a comment |
1 Answer
1
active
oldest
votes
You can use apply
and masking. Example:
def compare(row):
mask = df2[
(df2['x0'] <= row['x0']) &
(df2['x1'] >= row['x1']) &
(df2['y0'] <= row['y0']) &
(df2['y1'] >= row['y1'])
]
if mask.empty:
return row
row['T'] = mask['T'].tolist()[0]
row['R'] = mask['R'].tolist()[0]
row['C'] = mask['C'].tolist()[0]
return row
result = df1.apply(compare, axis=1)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260436%2fmerge-dataframes-on-basis-of-coordinates-having-no-common-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use apply
and masking. Example:
def compare(row):
mask = df2[
(df2['x0'] <= row['x0']) &
(df2['x1'] >= row['x1']) &
(df2['y0'] <= row['y0']) &
(df2['y1'] >= row['y1'])
]
if mask.empty:
return row
row['T'] = mask['T'].tolist()[0]
row['R'] = mask['R'].tolist()[0]
row['C'] = mask['C'].tolist()[0]
return row
result = df1.apply(compare, axis=1)
add a comment |
You can use apply
and masking. Example:
def compare(row):
mask = df2[
(df2['x0'] <= row['x0']) &
(df2['x1'] >= row['x1']) &
(df2['y0'] <= row['y0']) &
(df2['y1'] >= row['y1'])
]
if mask.empty:
return row
row['T'] = mask['T'].tolist()[0]
row['R'] = mask['R'].tolist()[0]
row['C'] = mask['C'].tolist()[0]
return row
result = df1.apply(compare, axis=1)
add a comment |
You can use apply
and masking. Example:
def compare(row):
mask = df2[
(df2['x0'] <= row['x0']) &
(df2['x1'] >= row['x1']) &
(df2['y0'] <= row['y0']) &
(df2['y1'] >= row['y1'])
]
if mask.empty:
return row
row['T'] = mask['T'].tolist()[0]
row['R'] = mask['R'].tolist()[0]
row['C'] = mask['C'].tolist()[0]
return row
result = df1.apply(compare, axis=1)
You can use apply
and masking. Example:
def compare(row):
mask = df2[
(df2['x0'] <= row['x0']) &
(df2['x1'] >= row['x1']) &
(df2['y0'] <= row['y0']) &
(df2['y1'] >= row['y1'])
]
if mask.empty:
return row
row['T'] = mask['T'].tolist()[0]
row['R'] = mask['R'].tolist()[0]
row['C'] = mask['C'].tolist()[0]
return row
result = df1.apply(compare, axis=1)
answered Nov 12 '18 at 15:12
ievbu
364
364
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260436%2fmerge-dataframes-on-basis-of-coordinates-having-no-common-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?
– Michael
Nov 12 '18 at 15:03