Thread: Question for python regex: matching double backslashes '\\'
hi,
writing script scrape urls website. following string i'm trying match (it found in page source read urllib2.urlopen(webpage).read()):
the regex search in python is:code:"stream_h264_url":"http:\\/\\/www.dailymotion.com\\/cdn\\/h264-512x384\\/video\\/xu41s3.mp4?auth=1349806412-337e4c35a8590a1dabc2761376070386"
where html page source of webpage i'm interested in.code:re.search('"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"',html)
error saying: unexpected end of regular expression.
if change regex from,
'"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"'
to
'"stream_h264_url":"http:[-\\\\/a-za-z0-9?=.]+"'
matches perfectly. don't understand why have match 2 backslashes opposed single literal backslash. shouldn't literal backslash ('\\') match every single backslash in page source?
appreciated.
python uses backslash special escaping character. 2 backslashes expand one. expand "\\" "\" , "\\\\" "\\". avoid needing rediculous amounts of backslashes, can use raw string:
the 'r' before string causes python interpret string literally, ignoring escape sequences.code:r'"stream_h264_url":"http:[-\\/a-za-z0-9?=.]+"'
Forum The Ubuntu Forum Community Ubuntu Specialised Support Development & Programming Programming Talk Question for python regex: matching double backslashes '\\'
Ubuntu
Comments
Post a Comment