Oracle SQL in string matches and table joins

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I need some guidance on how to approach this problem.

Consider the following 3 tables:
enter image description here

I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.

The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.

How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.

The resulting table would look like this:
enter image description here

edited Nov 16 '18 at 13:54

asked Nov 16 '18 at 13:42

vwdewaal

3392418

What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

– default locale
Nov 16 '18 at 13:48

The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

– vwdewaal
Nov 16 '18 at 13:55

Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

– default locale
Nov 16 '18 at 14:52

1

Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

– vwdewaal
Nov 16 '18 at 14:56

This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

– default locale
Nov 16 '18 at 15:04

add a comment |

I need some guidance on how to approach this problem.

Consider the following 3 tables:
enter image description here

I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.

How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.

The resulting table would look like this:
enter image description here

edited Nov 16 '18 at 13:54

asked Nov 16 '18 at 13:42

vwdewaal

3392418

What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

– default locale
Nov 16 '18 at 13:48

The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

– vwdewaal
Nov 16 '18 at 13:55

Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

– default locale
Nov 16 '18 at 14:52

1

Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

– vwdewaal
Nov 16 '18 at 14:56

This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

– default locale
Nov 16 '18 at 15:04

add a comment |

I need some guidance on how to approach this problem.

Consider the following 3 tables:
enter image description here

I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.

How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.

The resulting table would look like this:
enter image description here

edited Nov 16 '18 at 13:54

asked Nov 16 '18 at 13:42

vwdewaal

3392418

I need some guidance on how to approach this problem.

Consider the following 3 tables:
enter image description here

I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.

How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.

The resulting table would look like this:
enter image description here

python oracle pandas plsql

edited Nov 16 '18 at 13:54

asked Nov 16 '18 at 13:42

vwdewaal

3392418

edited Nov 16 '18 at 13:54

asked Nov 16 '18 at 13:42

vwdewaal

3392418

edited Nov 16 '18 at 13:54

asked Nov 16 '18 at 13:42

vwdewaal

3392418

asked Nov 16 '18 at 13:42

vwdewaal

3392418

asked Nov 16 '18 at 13:42

vwdewaal

3392418

What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

– default locale
Nov 16 '18 at 13:48

The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

– vwdewaal
Nov 16 '18 at 13:55

Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

– default locale
Nov 16 '18 at 14:52

1

Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

– vwdewaal
Nov 16 '18 at 14:56

This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

– default locale
Nov 16 '18 at 15:04

add a comment |

What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

– default locale
Nov 16 '18 at 13:48

The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

– vwdewaal
Nov 16 '18 at 13:55

Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

– default locale
Nov 16 '18 at 14:52

1

Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

– vwdewaal
Nov 16 '18 at 14:56

This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

– default locale
Nov 16 '18 at 15:04

What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

– default locale
Nov 16 '18 at 13:48

The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

– vwdewaal
Nov 16 '18 at 13:55

Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

– default locale
Nov 16 '18 at 14:52

Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

– vwdewaal
Nov 16 '18 at 14:56

This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

– default locale
Nov 16 '18 at 15:04

add a comment |

1 Answer
1

active

oldest

votes

I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.

Doing something simple like this to split the ids onto different rows:

select trim(column_value) partner_id,

       team

  from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))

Gives an output like this:

+------------+------+

| PARTNER_ID | TEAM |

+------------+------+

| [012345]   | Blue |

| [012346]   | Blue |

| [012347]   | Red  |

+------------+------+

Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.

After your data is in the easier format, the joins become easy.

select people.identifier,

       people.name,

       formatted_teams.team,

       formatted_meals.meal

from people

join (select trim(column_value) partner_id,

              team

       from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams

  on people.identifier = formatted_teams.partner_id

left join (select trim(column_value) plate_id,

            meal

       from meals,

       xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals

  on people.identifier = formatted_meals.plate_id

order by people.identifier

Output:

+------------+--------+------+--------+

| IDENTIFIER |  NAME  | TEAM |  MEAL  |

+------------+--------+------+--------+

| [012345]   | George | Blue | Salmon |

| [012346]   | Jenny  | Blue |        |

| [012347]   | Alex   | Red  | Pike   |

| [012347]   | Alex   | Red  | Cod    |

+------------+--------+------+--------+

answered Nov 16 '18 at 19:52

Patrick H

520513

Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339049%2foracle-sql-in-string-matches-and-table-joins%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.

Doing something simple like this to split the ids onto different rows:

select trim(column_value) partner_id,

       team

  from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))

Gives an output like this:

+------------+------+

| PARTNER_ID | TEAM |

+------------+------+

| [012345]   | Blue |

| [012346]   | Blue |

| [012347]   | Red  |

+------------+------+

Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.

After your data is in the easier format, the joins become easy.

select people.identifier,

       people.name,

       formatted_teams.team,

       formatted_meals.meal

from people

join (select trim(column_value) partner_id,

              team

       from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams

  on people.identifier = formatted_teams.partner_id

left join (select trim(column_value) plate_id,

            meal

       from meals,

       xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals

  on people.identifier = formatted_meals.plate_id

order by people.identifier

Output:

+------------+--------+------+--------+

| IDENTIFIER |  NAME  | TEAM |  MEAL  |

+------------+--------+------+--------+

| [012345]   | George | Blue | Salmon |

| [012346]   | Jenny  | Blue |        |

| [012347]   | Alex   | Red  | Pike   |

| [012347]   | Alex   | Red  | Cod    |

+------------+--------+------+--------+

answered Nov 16 '18 at 19:52

Patrick H

520513

Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37

add a comment |

I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.

Doing something simple like this to split the ids onto different rows:

select trim(column_value) partner_id,

       team

  from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))

Gives an output like this:

+------------+------+

| PARTNER_ID | TEAM |

+------------+------+

| [012345]   | Blue |

| [012346]   | Blue |

| [012347]   | Red  |

+------------+------+

Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.

After your data is in the easier format, the joins become easy.

select people.identifier,

       people.name,

       formatted_teams.team,

       formatted_meals.meal

from people

join (select trim(column_value) partner_id,

              team

       from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams

  on people.identifier = formatted_teams.partner_id

left join (select trim(column_value) plate_id,

            meal

       from meals,

       xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals

  on people.identifier = formatted_meals.plate_id

order by people.identifier

Output:

+------------+--------+------+--------+

| IDENTIFIER |  NAME  | TEAM |  MEAL  |

+------------+--------+------+--------+

| [012345]   | George | Blue | Salmon |

| [012346]   | Jenny  | Blue |        |

| [012347]   | Alex   | Red  | Pike   |

| [012347]   | Alex   | Red  | Cod    |

+------------+--------+------+--------+

answered Nov 16 '18 at 19:52

Patrick H

520513

Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37

add a comment |

I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.

Doing something simple like this to split the ids onto different rows:

select trim(column_value) partner_id,

       team

  from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))

Gives an output like this:

+------------+------+

| PARTNER_ID | TEAM |

+------------+------+

| [012345]   | Blue |

| [012346]   | Blue |

| [012347]   | Red  |

+------------+------+

Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.

After your data is in the easier format, the joins become easy.

select people.identifier,

       people.name,

       formatted_teams.team,

       formatted_meals.meal

from people

join (select trim(column_value) partner_id,

              team

       from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams

  on people.identifier = formatted_teams.partner_id

left join (select trim(column_value) plate_id,

            meal

       from meals,

       xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals

  on people.identifier = formatted_meals.plate_id

order by people.identifier

Output:

+------------+--------+------+--------+

| IDENTIFIER |  NAME  | TEAM |  MEAL  |

+------------+--------+------+--------+

| [012345]   | George | Blue | Salmon |

| [012346]   | Jenny  | Blue |        |

| [012347]   | Alex   | Red  | Pike   |

| [012347]   | Alex   | Red  | Cod    |

+------------+--------+------+--------+

answered Nov 16 '18 at 19:52

Patrick H

520513

I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.

Doing something simple like this to split the ids onto different rows:

select trim(column_value) partner_id,

       team

  from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))

Gives an output like this:

+------------+------+

| PARTNER_ID | TEAM |

+------------+------+

| [012345]   | Blue |

| [012346]   | Blue |

| [012347]   | Red  |

+------------+------+

Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.

After your data is in the easier format, the joins become easy.

select people.identifier,

       people.name,

       formatted_teams.team,

       formatted_meals.meal

from people

join (select trim(column_value) partner_id,

              team

       from teams,

       xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams

  on people.identifier = formatted_teams.partner_id

left join (select trim(column_value) plate_id,

            meal

       from meals,

       xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals

  on people.identifier = formatted_meals.plate_id

order by people.identifier

Output:

+------------+--------+------+--------+

| IDENTIFIER |  NAME  | TEAM |  MEAL  |

+------------+--------+------+--------+

| [012345]   | George | Blue | Salmon |

| [012346]   | Jenny  | Blue |        |

| [012347]   | Alex   | Red  | Pike   |

| [012347]   | Alex   | Red  | Cod    |

+------------+--------+------+--------+

answered Nov 16 '18 at 19:52

Patrick H

520513

answered Nov 16 '18 at 19:52

Patrick H

520513

answered Nov 16 '18 at 19:52

Patrick H

520513

answered Nov 16 '18 at 19:52

Patrick H

520513

Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37

add a comment |

Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37

Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky