Skip to main content

Thread: Question for python regex: matching double backslashes '\\'


hi,
writing script scrape urls website. following string i'm trying match (it found in page source read urllib2.urlopen(webpage).read()):
code:
"stream_h264_url":"http:\\/\\/www.dailymotion.com\\/cdn\\/h264-512x384\\/video\\/xu41s3.mp4?auth=1349806412-337e4c35a8590a1dabc2761376070386"
the regex search in python is:
code:
re.search('"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"',html)
where html page source of webpage i'm interested in.

error saying: unexpected end of regular expression.
if change regex from,

'"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"'

to

'"stream_h264_url":"http:[-\\\\/a-za-z0-9?=.]+"'

matches perfectly. don't understand why have match 2 backslashes opposed single literal backslash. shouldn't literal backslash ('\\') match every single backslash in page source?

appreciated.

python uses backslash special escaping character. 2 backslashes expand one. expand "\\" "\" , "\\\\" "\\". avoid needing rediculous amounts of backslashes, can use raw string:
code:
r'"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"'
the 'r' before string causes python interpret string literally, ignoring escape sequences.


Forum The Ubuntu Forum Community Ubuntu Specialised Support Development & Programming Programming Talk Question for python regex: matching double backslashes '\\'


Ubuntu

Comments

Popular posts from this blog

Could not place because the source rectangle is empty

Thread: Using smartcard reader with vpnc

Adobe Font Folio 7.0 or just 7?