Regex to find name in sentence

I have some sentence like

1:

"RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."

2:

"Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".

p = r'\s*(\w+?)\s[(]' 

use this pattern, I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."

Who can help me?

728x90

3 Answers Regex to find name in sentence

Here is one approach:

line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
results = re.findall( r'([A-Z][\w+'](?: [JS][r][.]?)?)(?= \([A-Z]+\))', line, re.M|re.I)
print(results)

['Oubre Jr', 'Nurkic']

The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr. or Sr., which in turn is followed by a ([A-Z]+) term.

4 months ago

You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources

import re

suffs = ["Jr."] # append more to list

rsu   = r"(?:"+"|".join(suffs)+")? ?"

# combine with suffixes
regex = r"(\w+ "+rsu+")\(\w{3}\)"

test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

matches = re.finditer(regex, test_str, re.MULTILINE)

names = []
for matchNum, match in enumerate(matches,1):
    for groupNum in range(0, len(match.groups())):
        names.extend(match.groups(groupNum))

print(names)  

Output:

['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']

This should work as long as you do not have Names with non-\w in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.


Explanation:

  • r"(?:"+"|".join(suffs)+")? ?" --> all items in the list suffs are strung together via | (OR) as non grouping (?:...) and made optional followed by optional space.
  • r"(\w+ "+rsu+")\(\w{3}\)" --> the regex looks for any word characters followed by optional suffs group we just build, followed by literal ( then three word characters followed by another literal )

4 months ago

You can use the following regex:

(?:[A-Z][a-z][\s\.a-z]*)+(?=\s\()

|-----Main Pattern-----|


Details:

  • (?:) - Creates a non-capturing group
  • [A-Z] - Captures 1 uppercase letter
  • [a-z] - Captures 1 lowercase letter
  • [\s\.a-z]* - Captures spaces (' '), periods ('.') or lowercase letters 0+ times
  • (?=\s\() - Captures the main pattern if it is only followed by ' (' string

str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. 

Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''

res = re.findall( r'(?:[A-Z][a-z][\s\.a-z]*)+(?=\s\()', str )

print(res)

Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3

Match: https://regex101.com/r/OsLTrY/1

4 months ago