The problem .split() can’t solve
Lesson 1 extracted a serial with line.split()[3] — which works while the
serial is always the fourth word. But show output shifts: fields go
missing, columns vary by platform, and the thing you want floats somewhere
in a paragraph. What stays constant is the shape of the data — an IPv4
address is always four number-groups joined by dots, no matter where it
sits. Regular expressions let you describe that shape:
import re
arp_line = "Internet 10.20.30.1 4 0024.c4e9.48ae ARPA Vlan10"
m = re.search(r"\d+\.\d+\.\d+\.\d+", arp_line)
if m:
print(m.group()) # 10.20.30.1
Two things to lock in immediately:
re.search()scans the whole string for the first place the pattern fits, and returns a match object — orNoneif nothing fit.- Patterns are raw strings —
r"\d+", with therprefix. Regex is built on backslashes, and so are Python string escapes; thertells Python to pass your backslashes through untouched instead of interpreting them first.
The vocabulary (you need less than you think)
| Pattern | Means | Network example |
|---|---|---|
\d | a digit | \d+ — a VLAN ID |
\w | letter, digit, or _ | part of a hostname |
\s / \S | whitespace / NON-whitespace | \S+ — “one word”, your workhorse |
+ / * | one-or-more / zero-or-more of the previous thing | \d+ — 1 to many digits |
? | the previous thing is optional | Gig? |
{4} | exactly 4 of the previous thing | [0-9a-f]{4} — one MAC chunk |
[abc] / [0-9a-f] | any one character from the set | hex digits |
^ / $ | start / end of the string | ^interface — line opens a stanza |
. | any single character | the dot in an IP must be \. (escaped!) |
Three patterns worth memorizing because you’ll type them for the rest of your career:
ip_pattern = r"\d+\.\d+\.\d+\.\d+" # IPv4 (practical form)
mac_pattern = r"[0-9a-fA-F]{4}\.[0-9a-fA-F]{4}\.[0-9a-fA-F]{4}" # Cisco MAC
word = r"\S+" # "the next field"
Capture groups: extracting, not just finding
Wrap part of a pattern in parentheses and the match object remembers what that part matched. This is how a find becomes an extract:
line = "Processor board ID FOC2217A0AB"
m = re.search(r"Processor board ID (\S+)", line)
if m:
serial = m.group(1) # 'FOC2217A0AB'
group(0) (or plain group()) is everything the pattern matched;
your parentheses are numbered from 1, left to right. Two groups pull two
fields at once — here, interface name and IP from show ip interface brief:
line = "GigabitEthernet1/0/1 10.20.30.1 YES manual up up"
m = re.search(r"^(\S+)\s+(\d+\.\d+\.\d+\.\d+)", line)
if m:
intf, ip = m.group(1), m.group(2)
re.findall(): harvest everything at once
Where search finds the first match, findall returns a list of all of
them — no match objects, just the matched strings (or the captured group,
if you have exactly one):
output = """
Internet 10.20.30.1 4 0024.c4e9.48ae ARPA Vlan10
Internet 10.20.30.45 12 6c41.0e9a.1f02 ARPA Vlan10
Internet 10.20.31.1 8 0024.c4e9.51bb ARPA Vlan20
"""
ips = re.findall(r"\d+\.\d+\.\d+\.\d+", output)
# ['10.20.30.1', '10.20.30.45', '10.20.31.1']
One line, every IP in an ARP table. Combine with Lesson 4 and
set(re.findall(...)) dedupes as it harvests.
Output appears here. First run downloads the Python runtime (~10 MB), so give it a few seconds.
Exercises (graded)
cd labs/python-foundations/lesson05
pytest -q
First lab that needs an import — put import re at the top of
exercises.py. Five functions:
find_serial(text)— the serial from aProcessor board IDline, orNonefind_all_ips(text)— every IPv4 address in a blob, in orderfind_macs(text)— every Cisco-format MAC (aaaa.bbbb.cccc)interface_ip(line)—(name, ip)tuple from ashow ip int briefline, orNoneis_valid_hostname(name)— enforce the naming standard: starts with a lowercase letter, then lowercase letters, digits, or hyphens only
Why are regex patterns written as raw strings, like r"\d+"?
m = re.search(r"uptime is (.+)", line) then m.group(1) crashes with AttributeError: NoneType has no attribute group. What happened?
What does re.findall(r"\d+\.\d+\.\d+\.\d+", arp_output) return when the output holds three ARP entries?
Summary
Regex describes the shape of data instead of its position: \d, \S,
quantifiers, and anchors cover most of what network text demands, written
always as raw strings. re.search() plus a capture group turns finding
into extracting — guarded by if m:, because the NoneType-has-no-group
crash is the most common regex bug there is. re.findall() harvests every
match in one pass, and knowing when to graduate from hand-rolled patterns
to TextFSM templates is itself a professional skill — one the flagship
course builds directly on this lesson. Next up: functions — packaging the
parsers you’ve been writing into tools you can reuse.