Solved

How to use RegEX while writing the rules for extracting a sender name

  • 16 November 2021
  • 1 reply
  • 27 views

Userlevel 2
Badge

Hi I would like to know how to use a RegEx pattern while writing rules in the expert.ai studio. Here is an example of the sender name that I would like to retreive from an email

From: Venkat from Raedan AI <venkat@raedanai.com>

I would like to retrieve Venkat from Raedan AI and I think I can only retrieve this by using RegEx Pattern in my rules.

I am new to Regex and would like to know how to write a rule to extract Venkat from Raedan AI in between From: and <

Any help would be appreciated.

Cheers,

Venkat

icon

Best answer by Nico 18 November 2021, 16:17

View original

1 reply

Badge

Hi Venkat,

there are many ways of doing that, according to the level of “strictness” you want to keep. This is a simple rule that does the trick:

SCOPE SENTENCE{    IDENTIFY(EMAIL)    {        @SenderName[PATTERN("From: ([^<]{2,50}) <[^@]+@[^>]+>")]    }}

Given a sentence like 

From: Venkat from Raedan AI <venkat@raedanai.com>

this will extract exactly

Venkat from Raedan AI

Here’s how the PATTERN content works (ignore the parentheses for the moment):

  • First it matches “From: “ (including a blank space at the end:
  • Then it looks for  [^<]{2,50}  a sequence of characters, long from 2 to 50 (just an example), which is the name. Any character is fine except the “<”. The [^<] means “the set of all characters excluding “<”. The {2, 50} is the minimum and maximum number of repetitions.
  • After that it expects a space, then the “<”
  • Then a sequence of character  [^@]+  excluding “@” (same as above), the + means “one or more”.
  • Then the actual “@”
  • Then again  [^>]+ a sequence of characters excluding “>”
  • and finally the “>”

So this pattern will match your whole line, “From: …...com>”. The informal meaning is: look for “From: “ then some text (not too much), then a “<”, then again some text, a “@”, some text again and finally a “>”.

Now, the two parentheses in the pattern define a capturing group. In this case it’s ([^<]{2,50})

And this is what it is extracted by the rule. This means that the rule will match the whole line, but will only extract what’s inside the capturing group. In this case it’s the sequence of characters (from 2 to 50) after the “From:” and before the “<”.

Of course, you can make this RegExp more complex, limiting the maximum length of some part, or the set of characters that can belong to some element, to avoid false positives. But I didn’t mean to make it too complex and unreadable, to start.

Here Regular expressions overview - expert.ai languages reference  you find the documentation about PATTERN.

And here Groups peculiarities - expert.ai languages reference you find some more info about capturing groups.

Please note that you can find plenty of sites on the Internet providing tutorials and reference about Regular Expressions, that will certainly help!

Nico

Reply