RASA NLU: Can't extract entity
up vote
0
down vote
favorite
I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.
There are two scenarios(only space difference):
When I pass
http://www.google.comm, 1283923, [9283911,9309212,9283238]
, it is considering only[
bracket as the pst entity.When I pass
http://www.google.comm, 1283923, [9283911, 9309212, 9283238]
, it is working fine and recognizing[9283911, 9309212, 9283238]
as the pst entity as expected.
For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity
In the response, I am getting this output:
{
'intent': {
'name': None,
'confidence': 0.0
},
'entities': [
{
'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
},
{
'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
},
{
'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'
}
],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'
}
So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.
rasa-nlu
add a comment |
up vote
0
down vote
favorite
I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.
There are two scenarios(only space difference):
When I pass
http://www.google.comm, 1283923, [9283911,9309212,9283238]
, it is considering only[
bracket as the pst entity.When I pass
http://www.google.comm, 1283923, [9283911, 9309212, 9283238]
, it is working fine and recognizing[9283911, 9309212, 9283238]
as the pst entity as expected.
For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity
In the response, I am getting this output:
{
'intent': {
'name': None,
'confidence': 0.0
},
'entities': [
{
'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
},
{
'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
},
{
'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'
}
],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'
}
So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.
rasa-nlu
What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25
I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11
I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.
There are two scenarios(only space difference):
When I pass
http://www.google.comm, 1283923, [9283911,9309212,9283238]
, it is considering only[
bracket as the pst entity.When I pass
http://www.google.comm, 1283923, [9283911, 9309212, 9283238]
, it is working fine and recognizing[9283911, 9309212, 9283238]
as the pst entity as expected.
For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity
In the response, I am getting this output:
{
'intent': {
'name': None,
'confidence': 0.0
},
'entities': [
{
'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
},
{
'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
},
{
'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'
}
],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'
}
So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.
rasa-nlu
I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.
There are two scenarios(only space difference):
When I pass
http://www.google.comm, 1283923, [9283911,9309212,9283238]
, it is considering only[
bracket as the pst entity.When I pass
http://www.google.comm, 1283923, [9283911, 9309212, 9283238]
, it is working fine and recognizing[9283911, 9309212, 9283238]
as the pst entity as expected.
For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity
In the response, I am getting this output:
{
'intent': {
'name': None,
'confidence': 0.0
},
'entities': [
{
'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
},
{
'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
},
{
'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'
}
],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'
}
So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.
rasa-nlu
rasa-nlu
asked Nov 10 at 15:06
abhishake
14519
14519
What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25
I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11
I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13
add a comment |
What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25
I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11
I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13
What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25
What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25
I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11
I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11
I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13
I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "Hi",
"intent": "greet",
"entities":
}]
}
}
You can provide Regex data for training as below in the NLU json file.
{
"rasa_nlu_data": {
"regex_features": [
{
"name": "pst",
"pattern": "[..*]"
},
]
}
}
Reference: Regular Expression in Rasal NLU
I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "Hi",
"intent": "greet",
"entities":
}]
}
}
You can provide Regex data for training as below in the NLU json file.
{
"rasa_nlu_data": {
"regex_features": [
{
"name": "pst",
"pattern": "[..*]"
},
]
}
}
Reference: Regular Expression in Rasal NLU
I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26
add a comment |
up vote
0
down vote
It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "Hi",
"intent": "greet",
"entities":
}]
}
}
You can provide Regex data for training as below in the NLU json file.
{
"rasa_nlu_data": {
"regex_features": [
{
"name": "pst",
"pattern": "[..*]"
},
]
}
}
Reference: Regular Expression in Rasal NLU
I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26
add a comment |
up vote
0
down vote
up vote
0
down vote
It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "Hi",
"intent": "greet",
"entities":
}]
}
}
You can provide Regex data for training as below in the NLU json file.
{
"rasa_nlu_data": {
"regex_features": [
{
"name": "pst",
"pattern": "[..*]"
},
]
}
}
Reference: Regular Expression in Rasal NLU
It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "Hi",
"intent": "greet",
"entities":
}]
}
}
You can provide Regex data for training as below in the NLU json file.
{
"rasa_nlu_data": {
"regex_features": [
{
"name": "pst",
"pattern": "[..*]"
},
]
}
}
Reference: Regular Expression in Rasal NLU
answered Nov 12 at 15:48
Karthik Sunil
315
315
I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26
add a comment |
I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26
I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26
I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240241%2frasa-nlu-cant-extract-entity%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25
I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11
I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13