Python 2.7 Tutorial Pt 14

[ad_1]
I show you how to strip HTML tags from articles you got through Website Scraping using Python.

Code is here


Posted

in

by

Tags:

Comments

22 responses to “Python 2.7 Tutorial Pt 14”

  1. Derek Banas Avatar

    I actually use PHP most of the time, but with Python Beautiful Soup has improved lately and is quite good.

  2. emgoldex greeceEmgoldex Avatar

    hello again , its been a while… i was wondering which is the best method to use for web scrapping.. curl ? beautiful soap ? get_html? for example i can block the curl to my site through the confing.ini … so i wanna start scrapping but i dont know which is the right or best method to use …

  3. Sainath S Avatar

    Hai Derek,
    i have a question how to pass the credentials to scrap website.

  4. Herp Derpingson Avatar

    from bs4 import beautifulSoup

  5. Derek Banas Avatar

    They may have changed the tags a bit. Take a look if the tag changed around the snippet maybe

  6. theLach1234 Avatar

    I use your exact code but I only get the links and the titles. The code fails to output the snippet of the article. Any help? Has the feed for Huffington Post changed?

  7. pavanjared Avatar

    What'd you do to fix this error importing BS?

  8. Derek Banas Avatar

    I have a bunch of tutorials on scraping web pages with php. They are in my php tutorial playlist on my YouTube channel

  9. Marion Dumas Avatar

    Hello! I am wondering whether you have or know of a tutorial to scrape from pages that are auto-generated with Javascript.

  10. Derek Banas Avatar

    Sorry, but I'd have to know more about how that information is checked.

  11. harendra Singh Avatar

    Since my network is behind a proxy, so when i open a webpage it asks me for username and password, is there any way that i can store username password in the program it self so that it doesn't asks me…..
    I searched and used urllib2 -> proxy handlers but got error

  12. Derek Banas Avatar

    Send me an email and I'll see if I can help derekbanas@verizon.net

  13. Paula S F Avatar

    Hi Derek. I need your help Do you have an email..I wll write a lot ..hope you answer

  14. AlucardHelIsing Avatar

    figured it out now im just getting errors with re.findall giving an

    TypeError: Expected string or buffer

  15. Derek Banas Avatar

    Are you on a mac or pc

  16. AlucardHelIsing Avatar

    my only question is how to make eclipse recognize the beautifulsoup download (I used 'python setup.py install' in terminal so were does these files have to go? Like where do I have to put the beautifulsoup.py or other files that came with the install. As you would expect In eclipse I am getting an error
    Unresolved import: BeautifulSoup

  17. Derek Banas Avatar

    @entrevu To scrap anything you just need the basic concepts I covered here with a better understanding of regular expressions. I did a tutorial in PHP that covers advanced website scraping called Web Design and Programming Pt 24. The Regular Expression explanation is identical to regex in python. I hope that helps

  18. entrevu Avatar

    @ma1achite he's using Eclipse google it eclipse IDE

  19. Derek Banas Avatar

    @0Allhell Perform a view source in the browser to find out which tags you need to target. You can scrape anything that shows on the screen

  20. Mathew Esquibel Avatar

    I am currently trying to scape a friends list for a gaming console. Only problem I think is it reads before the JavaScript is complete I think. Do you know a way to scrape it after? Thanks. Nice tutorials

  21. Derek Banas Avatar

    @ma1achite I use eclipse classic. It's free and works with most every language

Leave a Reply

Your email address will not be published. Required fields are marked *