unix - remove (i) empty space between single characters and (ii) more than X consecutive instances of a word











up vote
-1
down vote

favorite












I would like to



(i) replace blank space between characters only if these characters are single; i.e. for instance



Down [Enter] p s -- a u x [Delete] 


should become



Down [Enter] ps -- aux [Delete] 


(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)



 [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio


becomes



 [Delete] [Delete] ab initio [Delete] [Delete] ab definitio


thanks!










share|improve this question






















  • Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
    – Cyrus
    Nov 9 at 19:16















up vote
-1
down vote

favorite












I would like to



(i) replace blank space between characters only if these characters are single; i.e. for instance



Down [Enter] p s -- a u x [Delete] 


should become



Down [Enter] ps -- aux [Delete] 


(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)



 [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio


becomes



 [Delete] [Delete] ab initio [Delete] [Delete] ab definitio


thanks!










share|improve this question






















  • Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
    – Cyrus
    Nov 9 at 19:16













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I would like to



(i) replace blank space between characters only if these characters are single; i.e. for instance



Down [Enter] p s -- a u x [Delete] 


should become



Down [Enter] ps -- aux [Delete] 


(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)



 [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio


becomes



 [Delete] [Delete] ab initio [Delete] [Delete] ab definitio


thanks!










share|improve this question













I would like to



(i) replace blank space between characters only if these characters are single; i.e. for instance



Down [Enter] p s -- a u x [Delete] 


should become



Down [Enter] ps -- aux [Delete] 


(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)



 [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio


becomes



 [Delete] [Delete] ab initio [Delete] [Delete] ab definitio


thanks!







unix awk sed






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 9 at 19:03









Pau

1




1












  • Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
    – Cyrus
    Nov 9 at 19:16


















  • Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
    – Cyrus
    Nov 9 at 19:16
















Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16




Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16












1 Answer
1






active

oldest

votes

















up vote
0
down vote













You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".



The first question, removing spaces between single characters, can be done with a loop in sed:



echo 'Down [Enter] p s -- a u x [Delete] ' | 
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]


Explanation:
With a direct approach a u x will be changed into au x after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u in au x was a singleton in the original string.

For remembering the places where a replacement has been done, we use a r (and remove it later).



:a; Label to return for the next replacement.
( [^ ]|r) A space followed by a letter OR our temporary r marker
([^ ]) A space followed by a letter
( |$) A space or end-of-line
/12r3/ Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.
ta Go back to the start-of-loop tag :a when something was replaced
s/r//g' Remove our temporary markers.



The second question is difficult too. The next solution is close but incorrect:



for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done


The problem is when the repeated string also appears on another place, as in
some some some input some some some or worse some some some input input input.



I do not see an easy fix for the sed solution, but awk will help here.

For counting repeated fields my solution is considering each word as one record.



for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done





share|improve this answer





















  • Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
    – Pau
    Nov 11 at 17:29












  • (i) In my sed the carriage return r works. It should be an unique character, perhaps Q for a test. Try replacing the r with control-v control-m.
    – Walter A
    Nov 11 at 20:21











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231852%2funix-remove-i-empty-space-between-single-characters-and-ii-more-than-x-con%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".



The first question, removing spaces between single characters, can be done with a loop in sed:



echo 'Down [Enter] p s -- a u x [Delete] ' | 
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]


Explanation:
With a direct approach a u x will be changed into au x after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u in au x was a singleton in the original string.

For remembering the places where a replacement has been done, we use a r (and remove it later).



:a; Label to return for the next replacement.
( [^ ]|r) A space followed by a letter OR our temporary r marker
([^ ]) A space followed by a letter
( |$) A space or end-of-line
/12r3/ Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.
ta Go back to the start-of-loop tag :a when something was replaced
s/r//g' Remove our temporary markers.



The second question is difficult too. The next solution is close but incorrect:



for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done


The problem is when the repeated string also appears on another place, as in
some some some input some some some or worse some some some input input input.



I do not see an easy fix for the sed solution, but awk will help here.

For counting repeated fields my solution is considering each word as one record.



for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done





share|improve this answer





















  • Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
    – Pau
    Nov 11 at 17:29












  • (i) In my sed the carriage return r works. It should be an unique character, perhaps Q for a test. Try replacing the r with control-v control-m.
    – Walter A
    Nov 11 at 20:21















up vote
0
down vote













You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".



The first question, removing spaces between single characters, can be done with a loop in sed:



echo 'Down [Enter] p s -- a u x [Delete] ' | 
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]


Explanation:
With a direct approach a u x will be changed into au x after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u in au x was a singleton in the original string.

For remembering the places where a replacement has been done, we use a r (and remove it later).



:a; Label to return for the next replacement.
( [^ ]|r) A space followed by a letter OR our temporary r marker
([^ ]) A space followed by a letter
( |$) A space or end-of-line
/12r3/ Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.
ta Go back to the start-of-loop tag :a when something was replaced
s/r//g' Remove our temporary markers.



The second question is difficult too. The next solution is close but incorrect:



for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done


The problem is when the repeated string also appears on another place, as in
some some some input some some some or worse some some some input input input.



I do not see an easy fix for the sed solution, but awk will help here.

For counting repeated fields my solution is considering each word as one record.



for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done





share|improve this answer





















  • Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
    – Pau
    Nov 11 at 17:29












  • (i) In my sed the carriage return r works. It should be an unique character, perhaps Q for a test. Try replacing the r with control-v control-m.
    – Walter A
    Nov 11 at 20:21













up vote
0
down vote










up vote
0
down vote









You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".



The first question, removing spaces between single characters, can be done with a loop in sed:



echo 'Down [Enter] p s -- a u x [Delete] ' | 
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]


Explanation:
With a direct approach a u x will be changed into au x after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u in au x was a singleton in the original string.

For remembering the places where a replacement has been done, we use a r (and remove it later).



:a; Label to return for the next replacement.
( [^ ]|r) A space followed by a letter OR our temporary r marker
([^ ]) A space followed by a letter
( |$) A space or end-of-line
/12r3/ Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.
ta Go back to the start-of-loop tag :a when something was replaced
s/r//g' Remove our temporary markers.



The second question is difficult too. The next solution is close but incorrect:



for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done


The problem is when the repeated string also appears on another place, as in
some some some input some some some or worse some some some input input input.



I do not see an easy fix for the sed solution, but awk will help here.

For counting repeated fields my solution is considering each word as one record.



for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done





share|improve this answer












You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".



The first question, removing spaces between single characters, can be done with a loop in sed:



echo 'Down [Enter] p s -- a u x [Delete] ' | 
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]


Explanation:
With a direct approach a u x will be changed into au x after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u in au x was a singleton in the original string.

For remembering the places where a replacement has been done, we use a r (and remove it later).



:a; Label to return for the next replacement.
( [^ ]|r) A space followed by a letter OR our temporary r marker
([^ ]) A space followed by a letter
( |$) A space or end-of-line
/12r3/ Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.
ta Go back to the start-of-loop tag :a when something was replaced
s/r//g' Remove our temporary markers.



The second question is difficult too. The next solution is close but incorrect:



for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done


The problem is when the repeated string also appears on another place, as in
some some some input some some some or worse some some some input input input.



I do not see an easy fix for the sed solution, but awk will help here.

For counting repeated fields my solution is considering each word as one record.



for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 10 at 21:21









Walter A

10.1k2930




10.1k2930












  • Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
    – Pau
    Nov 11 at 17:29












  • (i) In my sed the carriage return r works. It should be an unique character, perhaps Q for a test. Try replacing the r with control-v control-m.
    – Walter A
    Nov 11 at 20:21


















  • Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
    – Pau
    Nov 11 at 17:29












  • (i) In my sed the carriage return r works. It should be an unique character, perhaps Q for a test. Try replacing the r with control-v control-m.
    – Walter A
    Nov 11 at 20:21
















Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29






Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29














(i) In my sed the carriage return r works. It should be an unique character, perhaps Q for a test. Try replacing the r with control-v control-m.
– Walter A
Nov 11 at 20:21




(i) In my sed the carriage return r works. It should be an unique character, perhaps Q for a test. Try replacing the r with control-v control-m.
– Walter A
Nov 11 at 20:21


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231852%2funix-remove-i-empty-space-between-single-characters-and-ii-more-than-x-con%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Lugert, Oklahoma