AtMega

March 24, 2014

hi,
writing script scrape urls website. following string i'm trying match (it found in page source read urllib2.urlopen(webpage).read()):
code:
"stream_h264_url":"http:\\/\\/www.dailymotion.com\\/cdn\\/h264-512x384\\/video\\/xu41s3.mp4?auth=1349806412-337e4c35a8590a1dabc2761376070386"
the regex search in python is:
code:
re.search('"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"',html)
where html page source of webpage i'm interested in.

error saying: unexpected end of regular expression.
if change regex from,

'"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"'

to

'"stream_h264_url":"http:[-\\\\/a-za-z0-9?=.]+"'

matches perfectly. don't understand why have match 2 backslashes opposed single literal backslash. shouldn't literal backslash ('\\') match every single backslash in page source?

appreciated.

python uses backslash special escaping character. 2 backslashes expand one. expand "\\" "\" , "\\\\" "\\". avoid needing rediculous amounts of backslashes, can use raw string:
code:
r'"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"'
the 'r' before string causes python interpret string literally, ignoring escape sequences.

Forum The Ubuntu Forum Community Ubuntu Specialised Support Development & Programming Programming Talk Question for python regex: matching double backslashes '\\'

Ubuntu

Search This Blog

AtMega

Thread: Question for python regex: matching double backslashes '\\'

Comments

Post a Comment

Popular posts from this blog

Thread: Firefox print dialog doesn't remember settings

Error 400 - Photoshop services are not available

After Effects error:creating resource file on Windows