MySql: Select Distinct for words in different order











up vote
1
down vote

favorite












I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.



For example:



+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+


I would like to get that result:



+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+


My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.



Any thoughts?










share|improve this question
























  • Why Casey Dean Davidson changed to Casey Davidson ?
    – Madhur Bhaiya
    Nov 10 at 18:14










  • Thanks, It was my mistake ;)
    – fioljnw
    Nov 10 at 18:19










  • how do you assure that which one in the correct one. I mean Marshall Wilson OR Wilson Marshall?
    – FatemehNB
    Nov 10 at 18:21








  • 1




    Is it possible to have more than 3 substrings in the name ?
    – Madhur Bhaiya
    Nov 10 at 18:26






  • 1




    @MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
    – fioljnw
    Nov 10 at 18:57















up vote
1
down vote

favorite












I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.



For example:



+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+


I would like to get that result:



+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+


My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.



Any thoughts?










share|improve this question
























  • Why Casey Dean Davidson changed to Casey Davidson ?
    – Madhur Bhaiya
    Nov 10 at 18:14










  • Thanks, It was my mistake ;)
    – fioljnw
    Nov 10 at 18:19










  • how do you assure that which one in the correct one. I mean Marshall Wilson OR Wilson Marshall?
    – FatemehNB
    Nov 10 at 18:21








  • 1




    Is it possible to have more than 3 substrings in the name ?
    – Madhur Bhaiya
    Nov 10 at 18:26






  • 1




    @MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
    – fioljnw
    Nov 10 at 18:57













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.



For example:



+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+


I would like to get that result:



+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+


My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.



Any thoughts?










share|improve this question















I have problem with creating query, which getting no duplicate values form my table. Unfortunately, Full Name column has Name and Surname in different order.



For example:



+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+


I would like to get that result:



+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+


My target is to create query, which getting in similar way, for example: select distinct for Name and Surname in the same order.



Any thoughts?







mysql select distinct






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 18:16

























asked Nov 10 at 18:09









fioljnw

83




83












  • Why Casey Dean Davidson changed to Casey Davidson ?
    – Madhur Bhaiya
    Nov 10 at 18:14










  • Thanks, It was my mistake ;)
    – fioljnw
    Nov 10 at 18:19










  • how do you assure that which one in the correct one. I mean Marshall Wilson OR Wilson Marshall?
    – FatemehNB
    Nov 10 at 18:21








  • 1




    Is it possible to have more than 3 substrings in the name ?
    – Madhur Bhaiya
    Nov 10 at 18:26






  • 1




    @MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
    – fioljnw
    Nov 10 at 18:57


















  • Why Casey Dean Davidson changed to Casey Davidson ?
    – Madhur Bhaiya
    Nov 10 at 18:14










  • Thanks, It was my mistake ;)
    – fioljnw
    Nov 10 at 18:19










  • how do you assure that which one in the correct one. I mean Marshall Wilson OR Wilson Marshall?
    – FatemehNB
    Nov 10 at 18:21








  • 1




    Is it possible to have more than 3 substrings in the name ?
    – Madhur Bhaiya
    Nov 10 at 18:26






  • 1




    @MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
    – fioljnw
    Nov 10 at 18:57
















Why Casey Dean Davidson changed to Casey Davidson ?
– Madhur Bhaiya
Nov 10 at 18:14




Why Casey Dean Davidson changed to Casey Davidson ?
– Madhur Bhaiya
Nov 10 at 18:14












Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19




Thanks, It was my mistake ;)
– fioljnw
Nov 10 at 18:19












how do you assure that which one in the correct one. I mean Marshall Wilson OR Wilson Marshall?
– FatemehNB
Nov 10 at 18:21






how do you assure that which one in the correct one. I mean Marshall Wilson OR Wilson Marshall?
– FatemehNB
Nov 10 at 18:21






1




1




Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26




Is it possible to have more than 3 substrings in the name ?
– Madhur Bhaiya
Nov 10 at 18:26




1




1




@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57




@MadhurBhaiya, db-fiddle.com/f/qg1jVT23JSmfURjReWL1sn/0
– fioljnw
Nov 10 at 18:57












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.



We first tokenize the FullName into multiple words it is made out of. For that we use a number generator table gen. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL .. and so on.



We use Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() is used to remove any leading/trailing spaces left.



Now, the trick is to use this result-set as a Derived table, and do a Group_Concat() on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted value. Eventually, we simply need to Group By on words_sorted to weed out the duplicates.





Query #1



SELECT 
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;

| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |




View on DB Fiddle






share|improve this answer























  • Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
    – fioljnw
    Nov 10 at 19:47










  • @fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
    – Madhur Bhaiya
    Nov 10 at 19:49










  • you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
    – fioljnw
    Nov 10 at 23:57











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241947%2fmysql-select-distinct-for-words-in-different-order%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.



We first tokenize the FullName into multiple words it is made out of. For that we use a number generator table gen. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL .. and so on.



We use Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() is used to remove any leading/trailing spaces left.



Now, the trick is to use this result-set as a Derived table, and do a Group_Concat() on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted value. Eventually, we simply need to Group By on words_sorted to weed out the duplicates.





Query #1



SELECT 
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;

| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |




View on DB Fiddle






share|improve this answer























  • Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
    – fioljnw
    Nov 10 at 19:47










  • @fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
    – Madhur Bhaiya
    Nov 10 at 19:49










  • you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
    – fioljnw
    Nov 10 at 23:57















up vote
1
down vote



accepted










It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.



We first tokenize the FullName into multiple words it is made out of. For that we use a number generator table gen. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL .. and so on.



We use Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() is used to remove any leading/trailing spaces left.



Now, the trick is to use this result-set as a Derived table, and do a Group_Concat() on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted value. Eventually, we simply need to Group By on words_sorted to weed out the duplicates.





Query #1



SELECT 
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;

| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |




View on DB Fiddle






share|improve this answer























  • Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
    – fioljnw
    Nov 10 at 19:47










  • @fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
    – Madhur Bhaiya
    Nov 10 at 19:49










  • you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
    – fioljnw
    Nov 10 at 23:57













up vote
1
down vote



accepted







up vote
1
down vote



accepted






It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.



We first tokenize the FullName into multiple words it is made out of. For that we use a number generator table gen. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL .. and so on.



We use Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() is used to remove any leading/trailing spaces left.



Now, the trick is to use this result-set as a Derived table, and do a Group_Concat() on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted value. Eventually, we simply need to Group By on words_sorted to weed out the duplicates.





Query #1



SELECT 
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;

| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |




View on DB Fiddle






share|improve this answer














It requires lots of String operations, and usage of multiple Derived Tables. It may not be efficient.



We first tokenize the FullName into multiple words it is made out of. For that we use a number generator table gen. In this case, I have assumed that maximum number of substrings is 3. You can easily extend it further by adding more Selects, like, SELECT 4 UNION ALL .. and so on.



We use Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() is used to remove any leading/trailing spaces left.



Now, the trick is to use this result-set as a Derived table, and do a Group_Concat() on the words such that they are sorted in a ascending order. This way even the duplicate names (but substrings in different order), will get similar words_sorted value. Eventually, we simply need to Group By on words_sorted to weed out the duplicates.





Query #1



SELECT 
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;

| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |




View on DB Fiddle







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 10 at 19:33

























answered Nov 10 at 19:19









Madhur Bhaiya

15.8k52136




15.8k52136












  • Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
    – fioljnw
    Nov 10 at 19:47










  • @fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
    – Madhur Bhaiya
    Nov 10 at 19:49










  • you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
    – fioljnw
    Nov 10 at 23:57


















  • Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
    – fioljnw
    Nov 10 at 19:47










  • @fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
    – Madhur Bhaiya
    Nov 10 at 19:49










  • you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
    – fioljnw
    Nov 10 at 23:57
















Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47




Thank you so so much. I see how much work you put into it but I'm afraid that query can be too heavy for larger base.
– fioljnw
Nov 10 at 19:47












@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49




@fioljnw yeah I know about the performance part. But unfortunately in your case, I feel that this is one of the few possible solutions, using MySQL. You may also explore Full Text Searching or use specialized databases for this kind of fuzzy searching like Apache SOLR etc
– Madhur Bhaiya
Nov 10 at 19:49












you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57




you give me great idea. I add new column with sorted substring in full name. In simpler and faster way, I would get unique values. Thank you!
– fioljnw
Nov 10 at 23:57


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241947%2fmysql-select-distinct-for-words-in-different-order%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues