Life is very easy with Python: January 2012

Saturday, January 28, 2012

Python Beautiful Soup Url extract from web page

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re
import urllib2

def get_url_content(site_url):
    rt=""
    try:
        request = urllib2.Request(site_url) 
        f=urllib2.urlopen(request)
        content=f.read()
        f.close()
    except urllib2.HTTPError, error:
        content=str(error.read())
    return content

response=get_url_content('http://www.sust.edu/')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        print link['href']

Output:


All urls under this link

Beautiful Soup Python : Install

Beautiful Soup is an HTML/XML parser for Python that can turn even invalid markup into a parse tree. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

pip command

pip install beautifulsoup4

Install Steps:

- Download library from here
- Then extract the file.
- cd to this file directory from command prompt.
- run command python setup.py install

Python Tutorial

Life is very easy with Python

Saturday, January 28, 2012

Python Beautiful Soup Url extract from web page

Beautiful Soup Python : Install

pip command

Install Steps:

Search This Blog

Followers

About Me

Subjects

Archive