Python Programming Tutorial – 37 – Word Frequency Counter (3/3)

[ad_1]
Facebook –
GitHub –
Google+ –
LinkedIn –
reddit –
Support –
thenewboston –
Twitter –


Posted

in

by

Tags:

Comments

40 responses to “Python Programming Tutorial – 37 – Word Frequency Counter (3/3)”

  1. June Wang Avatar

    for key, value in sorted(word_count.items(),key=operator.itemgetter(1)):
    ^
    IndentationError: unindent does not match any outer indentation level

  2. Paweł Brysch Avatar

    5:40 – sposób na posortowanie słownika po ceszę (WOW)

  3. Paweł Brysch Avatar

    slownik[przod]=tyl – dodawanie wyrazu do słownika

  4. Thomas Avatar

    Hello everyone, I'm enjoying the course so far (complete newbe to coding) and can get this code to work for a single page (URL) of a local listings website. However when I try to pull multiple pages of data into the word counter I keep getting errors not matter how I jig the code. Can someone have a quick look at what might be wrong?

    #Word Counter

    from bs4 import BeautifulSoup
    import requests
    import operator

    max_pages = 1

    def Spider(max_pages):
    page_number = 2
    while page < max_pages:
    url = 'https://www.trademe.co.nz/browse/motors/default.aspx?cid=1&page=&#39; + str(page_number) + '&sort_order=motors_default&rptpath=1-268-'
    page_number = page_number + 1

    def start(url):
    word_list = []
    source_code = requests.get(url).text
    soup = BeautifulSoup(source_code, "html.parser")
    for post_text in soup.findAll('a', {'class': 'dotted'}):
    content = post_text.string
    words = content.lower().split()
    for each_word in words:
    word_list.append(each_word)
    clean_up_list(word_list)

    def clean_up_list(word_list):
    clean_word_list = []
    for word in word_list:
    symbols = "!@#$%^&*()~`_-<>?"
    for i in range (0, len(symbols)):
    word = word.replace(symbols[i], "")
    if len(word) > 0:
    print(word)
    clean_word_list.append(word)
    create_dictionary(clean_word_list)

    def create_dictionary (clean_word_list):
    word_count = {}
    for word in clean_word_list:
    if word in word_count:
    word_count[word] += 1
    else:
    word_count[word] = 1
    for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
    print(key, value)

    start(url)

  5. Francisco Schweizer Avatar

    31.01.2018- fixed the issues and is now fully working

    import requests
    from bs4 import BeautifulSoup
    import operator

    def start(url):
    word_list = []
    source_code = requests.get(url).text
    soup = BeautifulSoup(source_code, "html.parser")
    c = soup.get_text()
    words =c.lower()
    words =c.split()
    for each_word in words:
    word_list.append(each_word)
    clean_up_list(word_list)

    def clean_up_list(word_list):
    clean_word_list = []
    for word in word_list:
    symbols = "!@#$%^&*()_+{}:"<>?,./;'[]-='"
    for i in range(0, len(symbols)):
    word = word.replace(symbols[i], "")
    if len(word) > 0:
    clean_word_list.append(word)
    create_dictionary(clean_word_list)

    def create_dictionary(clean_word_list):
    word_count = {}
    for word in clean_word_list:
    if word in word_count:
    word_count[word] += 1
    += 1
    else:
    word_count[word] = 1
    for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
    print(key, value)

    start("https://en.wikipedia.org/wiki/Billionaire_Boys_Club&quot😉

  6. Ahmet Özdemir Avatar

    Well, the first 2 videos were quite fine but I can't say the same thing for the last one. Because, first of all you created a method called "clean_up_list" which gets rid of all the unwanted symbols of the words. But in this method you also call another method which creates dictionary to count frequency. I think it is not a good idea to call that method inside "clean_up_list" method because that method supposed to clean up words, not creating dictionary. Therefore I don't think you should call that method inside that method. On the other hand, you are using "operator" class to count frequency, which is actually not complicated job. I was expecting to see some efficient algoritm to do that.

    In any case, your videos are really simple and easy to understand. I am glad you are doing this. Thank you so much! 🙂

  7. Aryan Hegde Avatar

    Hey, it worked but it's not arranged in the right order and there are still symbols everywhere

  8. Timothy Wynne Avatar

    Love your tutorials. Fantastic. But the dark background makes it difficult to see some of the text that does not contrast drastically with the background. I realize this may be a late recommendation but thought you might like to know anyway. Keep up the great work.

  9. Piotr Purwin Avatar

    Why the last line of the functions is like the name of the next function but without 'def' prefix?

  10. Jonathan Rolfsen Avatar

    @thenewboston Just a side recommendation: Putting static variable definitions into loops (such as in the case of var 'symbols') wastes processing power. On this small a scale it doesn't matter, but it may teach newbs bad habits. There are a number of other things that could be refactored here that would still make sense to teach as an intro. That aside, great video, keep up the good work.

  11. Jan Sher Khan Avatar

    Hi, how can I print only first 10 lines of word frequency sets in the output?

  12. Poseidon Ericsson Avatar

    for anyone like me who was confused by how the "word_count" works, here is my explanation :
    1:word_count = {} here is a empty dictionary
    2:word_count[word] = 1 is equivalent of word_count.update({word:1}) # add "word" and set its value to 1 if it does not exist in the empty dictionary "word_count", which every "word" is not ,so python keeps adding word with value 1 to "word_count" dictionary until "if word in word count" then add 1 to the value of that "word"

    Hope this helps and let me know if anything i stated is incorrect.

  13. 8802rocks Avatar

    made this in java and took me twice as many lines of code

  14. Aditya Sharma Avatar

    Can anyone please tell me how word_count[word] +=1 is both increasing the word count and storing the words in word_count? Because we are not appending them in word_count

  15. MrChangCJ Avatar

    hmmm… any idea why my output is in unicode as such?
    (u'to', 14)
    (u'a', 14)
    (u'the', 23)

  16. Komron Aripov Avatar

    Working example as of 4/19/2017

    import requests
    from bs4 import BeautifulSoup
    import operator

    def all_words(url,):
    word_list = []
    source_code = requests.get(url).text
    soup = BeautifulSoup(source_code, "html.parser")
    for before_text in soup.find_all("a", {"class": "result-title"}):
    content = before_text.string
    words = content.lower().split()
    for each_word in words:
    word_list.append(each_word)
    clean_up_list(word_list)

    def clean_up_list(word_list):
    clean_word_list = []
    for word in word_list:
    symbols = "+"*%&/()=?_-:.;,£$![]}{¦|@#¬~´"
    for i in range(0, len(symbols)):
    word = word.replace(symbols[i], "")
    if len(word) > 0:
    clean_word_list.append(word)
    create_dictionary(clean_word_list)

    def create_dictionary(clean_word_list):
    word_count = {}
    for word in clean_word_list:
    if word in word_count:
    word_count[word] += 1
    else:
    word_count[word] = 1
    for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
    print(key, value)

    all_words("https://cnj.craigslist.org/search/sys")

  17. Lobster with Mustard and Rice Avatar

    it is amazing what a few lines of python can do.. wow

  18. T Cheng Avatar

    To split the words by space and other special characters:

    import re

    special_chars_reg_ex = '[!. (),-/:]' # Change this to include any special characters
    words = re.split(special_chars_reg_ex, content.lower())

  19. Indian Tech Support Avatar

    dude you are amazing

  20. Umedzhon Izbasarov Avatar

    I have got this error >> "for key, value in sorted(word_count.items(),key=operator.itemgetter(1)):
    ^
    IndentationError: unindent does not match any outer indentation level"

  21. himanshu aggarwal Avatar

    how come word_count itself printing the word and the no. of times it occured. as it only saves frequency of word

  22. Ausseriridische Avatar

    Your videos are so helpful, people should like them more!

  23. Joe Pannu Avatar

    trying to find the code for this section in forums can you please link it!

  24. Eclipse XIII Avatar

    does anyone know if this code translates well with a text file inputed into python?

  25. sangram patra Avatar

    AttributeError: 'Response' object has no attribute 'txt'

    i m getting this error,every time i do this typ of programm

  26. Miled Rizk Avatar

    another way to create a dictionary that contains the words and their count:

    word_count = {}

    for word in clean_word_list:
    if word not in word_count:
    word_count[word] = clean_word_list.count(word)

  27. Sean McDougal Avatar

    import requests
    from bs4 import BeautifulSoup
    import operator

    def start(url):
    word_list = []
    source_code = requests.get(url).text
    soup = BeautifulSoup(source_code, "html.parser")
    for headline_text in soup.findAll('a', {'class': 'hdrlnk'}):
    content = headline_text.string
    words = content.lower().split()
    for each_word in words:
    word_list.append(each_word)
    clean_up_list(word_list)

    def clean_up_list(word_list):
    clean_word_list = []
    for word in word_list:
    symbols = "~!@#$%^&*()[]{}_+`-=,.<>/?;:'""
    for i in range(0, len(symbols)):
    word = word.replace(symbols[i], "")
    if len(word) > 0:
    print(word)
    clean_word_list.append(word)
    create_dictionary(clean_word_list)

    def create_dictionary(clean_word_list):
    word_count = {}
    for word in clean_word_list:
    if word in word_count:
    word_count[word] += 1
    else:
    word_count[word] = 1
    for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
    print(key, value)

    start('https://seattle.craigslist.org/search/jjj')

  28. Arorffifufduetu Dadrigfgdgfdggfdggf Avatar

    TypeError: list indices must be integers, not str i get this error wht to do

  29. Talha Studios 2 Avatar

    bucky please try to give more time on explanation you know i am a noob

  30. Burton Poon Avatar

    I have 2 questions. Hope somebody can help me by answering my questions
    1. The functions are declared after you called it. Shouldn't the function be declared first, then it is callable? Why isn't there any error occurred?
    2. The array is made inside the function, why is it accessible from other function?
    Really want to know why. plx <3

  31. Suraj Upreti Avatar

    I have a problem on this. when i tried the same logic to sort according to key, it doesn't sort . I am using Pycharm and python 3.4.
    I also tried another example just to check this issue.
    x = {'suraj': 2, 'apple': 4, 'zebra': 3, 'peacock': 1, 0: 0}
    for key, value in sorted(x.items(), key=operator.itemgetter(0)):
    print(key, value)
    it says '"TypeError: unorderable types: str() < int()"

  32. rungus24 Avatar

    Can anyone tell me if it matters in what order the functions are defined? In this above thing I think function a is defined first and in that function b is called, then function b is defined, and so on. Does the order matter in Python? Is there a conventional way to do it? And am I right in thinking that in Python the functions all had to be defined before start() could be called? Thanks

  33. JayUPL Avatar

    I should probably stop useing a in my sentances. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. Damit

  34. H N Avatar

    you could import counter from collection module that way you wont need to create the create_dictionary function let the counter do the job instead. https://docs.python.org/2/library/collections.html

  35. laharish guntuka Avatar

    Mannn. . . . .No words to express. . .  YOu are awesomeeee. . . .

  36. Sneha S Avatar

    Thank you sooooo much for these

  37. Irv Do Avatar

    Actually python is used for multiplying vectors and matrices and other machine learning/data analysis computations.  It's a very powerful language for that regard.

    Bucky your the man, keep it up boss.

  38. Eric Liang Avatar

    Your python tutorial series gave me a good programming project idea 😀

  39. Danzi979 Avatar

    Great tutorial series! This seems fine for most words but as the apostrophes have been taken out and the text is all lower case, words like I'll and ill will be counted as one word. How can this be fixed?

  40. GonsalvoDeCordova Avatar

    Bucky – I use these in a classroom.  If you could go back to doing full HD/1080 that would help, as I project this on a screen as I heckle … er … make insightful comments.

Leave a Reply

Your email address will not be published. Required fields are marked *