unix - remove (i) empty space between single characters and (ii) more than X consecutive instances of a word
up vote
-1
down vote
favorite
I would like to
(i) replace blank space between characters only if these characters are single; i.e. for instance
Down [Enter] p s -- a u x [Delete]
should become
Down [Enter] ps -- aux [Delete]
(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)
[Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio
becomes
[Delete] [Delete] ab initio [Delete] [Delete] ab definitio
thanks!
unix awk sed
add a comment |
up vote
-1
down vote
favorite
I would like to
(i) replace blank space between characters only if these characters are single; i.e. for instance
Down [Enter] p s -- a u x [Delete]
should become
Down [Enter] ps -- aux [Delete]
(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)
[Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio
becomes
[Delete] [Delete] ab initio [Delete] [Delete] ab definitio
thanks!
unix awk sed
Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I would like to
(i) replace blank space between characters only if these characters are single; i.e. for instance
Down [Enter] p s -- a u x [Delete]
should become
Down [Enter] ps -- aux [Delete]
(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)
[Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio
becomes
[Delete] [Delete] ab initio [Delete] [Delete] ab definitio
thanks!
unix awk sed
I would like to
(i) replace blank space between characters only if these characters are single; i.e. for instance
Down [Enter] p s -- a u x [Delete]
should become
Down [Enter] ps -- aux [Delete]
(ii) remove words that are consecutively repeated more than X times until any other thing which is not the word, so that (say X=2)
[Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab inition [Delete] [Delete] [Delete] [Delete] [Delete] [Delete] ab definitio
becomes
[Delete] [Delete] ab initio [Delete] [Delete] ab definitio
thanks!
unix awk sed
unix awk sed
asked Nov 9 at 19:03
Pau
1
1
Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16
add a comment |
Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16
Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16
Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".
The first question, removing spaces between single characters, can be done with a loop in sed
:
echo 'Down [Enter] p s -- a u x [Delete] ' |
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]
Explanation:
With a direct approach a u x
will be changed into au x
after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u
in au x
was a singleton in the original string.
For remembering the places where a replacement has been done, we use a r
(and remove it later).
:a;
Label to return for the next replacement.( [^ ]|r)
A space followed by a letter OR our temporary r
marker([^ ])
A space followed by a letter( |$)
A space or end-of-line/12r3/
Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.ta
Go back to the start-of-loop tag :a
when something was replaceds/r//g'
Remove our temporary markers.
The second question is difficult too. The next solution is close but incorrect:
for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done
The problem is when the repeated string also appears on another place, as insome some some input some some some
or worse some some some input input input
.
I do not see an easy fix for the sed
solution, but awk
will help here.
For counting repeated fields my solution is considering each word as one record.
for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done
Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29
(i) In mysed
the carriage returnr
works. It should be an unique character, perhapsQ
for a test. Try replacing the r with control-v control-m.
– Walter A
Nov 11 at 20:21
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".
The first question, removing spaces between single characters, can be done with a loop in sed
:
echo 'Down [Enter] p s -- a u x [Delete] ' |
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]
Explanation:
With a direct approach a u x
will be changed into au x
after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u
in au x
was a singleton in the original string.
For remembering the places where a replacement has been done, we use a r
(and remove it later).
:a;
Label to return for the next replacement.( [^ ]|r)
A space followed by a letter OR our temporary r
marker([^ ])
A space followed by a letter( |$)
A space or end-of-line/12r3/
Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.ta
Go back to the start-of-loop tag :a
when something was replaceds/r//g'
Remove our temporary markers.
The second question is difficult too. The next solution is close but incorrect:
for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done
The problem is when the repeated string also appears on another place, as insome some some input some some some
or worse some some some input input input
.
I do not see an easy fix for the sed
solution, but awk
will help here.
For counting repeated fields my solution is considering each word as one record.
for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done
Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29
(i) In mysed
the carriage returnr
works. It should be an unique character, perhapsQ
for a test. Try replacing the r with control-v control-m.
– Walter A
Nov 11 at 20:21
add a comment |
up vote
0
down vote
You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".
The first question, removing spaces between single characters, can be done with a loop in sed
:
echo 'Down [Enter] p s -- a u x [Delete] ' |
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]
Explanation:
With a direct approach a u x
will be changed into au x
after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u
in au x
was a singleton in the original string.
For remembering the places where a replacement has been done, we use a r
(and remove it later).
:a;
Label to return for the next replacement.( [^ ]|r)
A space followed by a letter OR our temporary r
marker([^ ])
A space followed by a letter( |$)
A space or end-of-line/12r3/
Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.ta
Go back to the start-of-loop tag :a
when something was replaceds/r//g'
Remove our temporary markers.
The second question is difficult too. The next solution is close but incorrect:
for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done
The problem is when the repeated string also appears on another place, as insome some some input some some some
or worse some some some input input input
.
I do not see an easy fix for the sed
solution, but awk
will help here.
For counting repeated fields my solution is considering each word as one record.
for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done
Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29
(i) In mysed
the carriage returnr
works. It should be an unique character, perhapsQ
for a test. Try replacing the r with control-v control-m.
– Walter A
Nov 11 at 20:21
add a comment |
up vote
0
down vote
up vote
0
down vote
You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".
The first question, removing spaces between single characters, can be done with a loop in sed
:
echo 'Down [Enter] p s -- a u x [Delete] ' |
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]
Explanation:
With a direct approach a u x
will be changed into au x
after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u
in au x
was a singleton in the original string.
For remembering the places where a replacement has been done, we use a r
(and remove it later).
:a;
Label to return for the next replacement.( [^ ]|r)
A space followed by a letter OR our temporary r
marker([^ ])
A space followed by a letter( |$)
A space or end-of-line/12r3/
Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.ta
Go back to the start-of-loop tag :a
when something was replaceds/r//g'
Remove our temporary markers.
The second question is difficult too. The next solution is close but incorrect:
for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done
The problem is when the repeated string also appears on another place, as insome some some input some some some
or worse some some some input input input
.
I do not see an easy fix for the sed
solution, but awk
will help here.
For counting repeated fields my solution is considering each word as one record.
for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done
You did not get a lot of responses. I think the main reason is the combination of two different questions, both non-trivial. Normally it helps to show your own effort, but I anderstand your effort might have been thinkong for hours "where to start".
The first question, removing spaces between single characters, can be done with a loop in sed
:
echo 'Down [Enter] p s -- a u x [Delete] ' |
sed -r ':a;s/( [^ ]|r) ([^ ])( |$)/12r3/;ta; s/r//g'
Down [Enter] ps -- aux [Delete]
Explanation:
With a direct approach a u x
will be changed into au x
after the first replace, and the other space will be forgotten. You need to go over the replacements more than once and remember that the letter u
in au x
was a singleton in the original string.
For remembering the places where a replacement has been done, we use a r
(and remove it later).
:a;
Label to return for the next replacement.( [^ ]|r)
A space followed by a letter OR our temporary r
marker([^ ])
A space followed by a letter( |$)
A space or end-of-line/12r3/
Replace with the two remembered characters, insert a special marker and a space when it was not the last charater of the line.ta
Go back to the start-of-loop tag :a
when something was replaceds/r//g'
Remove our temporary markers.
The second question is difficult too. The next solution is close but incorrect:
for (( X=2; X<8; X++)); do
echo "X=$X (incorrect solution)"
echo 'some some some some some some some some some some some input' |
sed -r 's/([^ ]+[ ]+)(1{'${X}'})(1+)/2/g'
done
The problem is when the repeated string also appears on another place, as insome some some input some some some
or worse some some some input input input
.
I do not see an easy fix for the sed
solution, but awk
will help here.
For counting repeated fields my solution is considering each word as one record.
for (( X=2; X<8; X++)); do
echo "X=$X"
echo 'some some some some some some some some some some some input some some some some' |
awk -v x=$X 'BEGIN {RS="[ n]"; ORS='n'; repeated=1}
{ if (last==$0)
repeated++;
else
repeated=1;
}
{last=$0}
repeated <= x {print $0" "}
END {print "n"}
'
done
answered Nov 10 at 21:21
Walter A
10.1k2930
10.1k2930
Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29
(i) In mysed
the carriage returnr
works. It should be an unique character, perhapsQ
for a test. Try replacing the r with control-v control-m.
– Walter A
Nov 11 at 20:21
add a comment |
Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29
(i) In mysed
the carriage returnr
works. It should be an unique character, perhapsQ
for a test. Try replacing the r with control-v control-m.
– Walter A
Nov 11 at 20:21
Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29
Dear Walter. You are absolutely right, my apologies. It is my first post here. The solution I had come up with involves python, while I would prefer to stick to usual unix tools. I was looking into tr but I did not see a way. Thanks for your reply. There's a problem with (i), because your solution is removing all 'r' from the file, at least with my sed (I'm using OpenBSD here, not linux). As for (ii), I will try it now.
– Pau
Nov 11 at 17:29
(i) In my
sed
the carriage return r
works. It should be an unique character, perhaps Q
for a test. Try replacing the r with control-v control-m.– Walter A
Nov 11 at 20:21
(i) In my
sed
the carriage return r
works. It should be an unique character, perhaps Q
for a test. Try replacing the r with control-v control-m.– Walter A
Nov 11 at 20:21
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231852%2funix-remove-i-empty-space-between-single-characters-and-ii-more-than-x-con%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to SO. Stack Overflow is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself.
– Cyrus
Nov 9 at 19:16