e enjte, 19 mars 2009

Few helpful regexp to parse HTML

To remove blank lines
Expression: "[\r\n]+"

To find a particular script tag
Expression: <script[^>]*?(SCRIPT_FILENAME.*?\s*</script>)
Parameter: SCRIPT_FILENAME

To find a particular IMG tag
Expression: <img[^>]*?(IMAGE_FILENAME.*?>)
Parameter: IMAGE_FILENAME

To find a particular anchor tag
Expression: <(a)\b[^>]*>REFERENCE_FILE</\1>
Parameter: REFERENCE_FILE

Nuk ka komente: