Solved

Tagging a sequence of tokens

  • 10 August 2021
  • 5 replies
  • 82 views

Badge +1

Hello,

i was wondering if it is possible to add a tag to a sequence of tokens? Either manually with a tagging rule or automatically via a javascript implementation of ‘onTagger()’.

 

Please note that I am talking about the second level (token level) of the disambiguation but not the first level (atom level) of the disambiguation.

 

Thanks a lot for your help!

 

icon

Best answer by mbellei 11 August 2021, 12:23

No, actually you can’t do this kind of tagging with jscript. You can only add, modify or remove tags from the existing tokenization.

View original

5 replies

Badge +1

Hi @phmeier ,

actually it is exactly what tagger does.

If you write a tagger rule that tags a sequence of tokens, it creates a tag for it.

For example if you have this sentence: “I play baseball in the garden”

You write a rule like so:

SCOPE SENTENCE
{
TAGGER()
{
@MYTAG1[KEYWORD(“in the garden”)]
}
}

It will tag “in the garden”.

Infact if you write an extraction rule like so:

SCOPE SENTENCE
{
IDENTIFY(TEST)
{
@MYFIELD[TAG(MYTAG1)]
}
}

It will extract “in the garden”.

Bye Marco...

Badge +1

Hi @mbellei,

thank you very much for your answer.

I’m aware of the functionality of tagger rules. This even works with LEMMA by the way.

Considering the sentence “A bloody hell softly cried out in a poetic language, a journey to remember the extinct past.”

Through this rule we are able to tag ‘poetic language’.

SCOPE SENTENCE
{
TAGGER()
{
@SOMETAG[LEMMA("poetic")]|[#1]
>
@SOMETAG[LEMMA("language")]|[#2]
}

}

I’m wondering now if this functionality is also available in Javascript through the onTagger() function?

 

Thanks!

Badge

an addition to Philipp’s question on tagger scripting:

when we use manual, rule-based tagging with rules like this

TAGS {
@TESTTAG,
@scriptTag
}

SCOPE SENTENCE
{
TAGGER() {
@TESTTAG[LEMMA("new")]|[#1]
>> @TESTTAG[LEMMA("powerful")]|[#2]
>> @TESTTAG[LEMMA("development environment")]|[#3]
}
}

we find a single TAG entry in gen/TEST.txt.ctx.json:

      "tagger": [
{
"level": 10000,
"tags": [
{
"tag": "TESTTAG",
"syncon": -1,
"level": 0,
"gt": "",
"value": "new powerful development environment",
"pos": 23,
"len": 36,
"token_begin": 5,
"token_end": 7,
"rules": [
0
],
"deleted": false
}
]
}
]

 

when we use tagging by script like this in main.jr

function onTagger() {
DIS.tagSentence(0, "scriptTag")
}

we find the tag “scriptTag” for each individual token of the first sentence in the generated json output for our test document (gen/TEST.txt.ctx.json):

"script_info": {
"onTagger": [
{
"type": "TAG",
"token_id": 0,
"value": "scriptTag"
},
{
"type": "TAG",
"token_id": 1,
"value": "scriptTag"
},
{
"type": "TAG",
"token_id": 2,
"value": "scriptTag"
},
{
"type": "TAG",
"token_id": 3,
"value": "scriptTag"
},
{
"type": "TAG",
"token_id": 4,
"value": "scriptTag"
},
{
"type": "TAG",
"token_id": 5,
"value": "scriptTag"
},
{
"type": "TAG",
"token_id": 6,
"value": "scriptTag"
},
{
"type": "TAG",
"token_id": 7,
"value": "scriptTag"
},
...

Our problem is now how to add a single tag to a token sequence via scripting as it is possible to do it with rules.

Badge +1

No, actually you can’t do this kind of tagging with jscript. You can only add, modify or remove tags from the existing tokenization.

Badge

it’s a pity that tagging by script is so much more constrained than tagging by rules.

I think the constraints ought to be mentioned in the documentation:https://docs.expert.ai/studio/latest/languages/tagging/#tagging-by-script

It would also be worthwhile to add a note on the tagging of multi-word sequences by rules to the documentation. The example code could be taken from this thread.

Reply