[ad_1]
Facebook –
GitHub –
Google+ –
LinkedIn –
reddit –
Support –
thenewboston –
Twitter –
Python Programming Tutorial – 37 – Word Frequency Counter (3/3)
by
Tags:
Comments
40 responses to “Python Programming Tutorial – 37 – Word Frequency Counter (3/3)”
-
for key, value in sorted(word_count.items(),key=operator.itemgetter(1)):
^
IndentationError: unindent does not match any outer indentation level -
5:40 – sposób na posortowanie słownika po ceszę (WOW)
-
slownik[przod]=tyl – dodawanie wyrazu do słownika
-
Hello everyone, I'm enjoying the course so far (complete newbe to coding) and can get this code to work for a single page (URL) of a local listings website. However when I try to pull multiple pages of data into the word counter I keep getting errors not matter how I jig the code. Can someone have a quick look at what might be wrong?
#Word Counter
from bs4 import BeautifulSoup
import requests
import operatormax_pages = 1
def Spider(max_pages):
page_number = 2
while page < max_pages:
url = 'https://www.trademe.co.nz/browse/motors/default.aspx?cid=1&page=' + str(page_number) + '&sort_order=motors_default&rptpath=1-268-'
page_number = page_number + 1def start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code, "html.parser")
for post_text in soup.findAll('a', {'class': 'dotted'}):
content = post_text.string
words = content.lower().split()
for each_word in words:
word_list.append(each_word)
clean_up_list(word_list)def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "!@#$%^&*()~`_-<>?"
for i in range (0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
print(word)
clean_word_list.append(word)
create_dictionary(clean_word_list)def create_dictionary (clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value)start(url)
-
31.01.2018- fixed the issues and is now fully working
import requests
from bs4 import BeautifulSoup
import operatordef start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code, "html.parser")
c = soup.get_text()
words =c.lower()
words =c.split()
for each_word in words:
word_list.append(each_word)
clean_up_list(word_list)def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "!@#$%^&*()_+{}:"<>?,./;'[]-='"
for i in range(0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
clean_word_list.append(word)
create_dictionary(clean_word_list)def create_dictionary(clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
+= 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value)start("https://en.wikipedia.org/wiki/Billionaire_Boys_Club"😉
-
Well, the first 2 videos were quite fine but I can't say the same thing for the last one. Because, first of all you created a method called "clean_up_list" which gets rid of all the unwanted symbols of the words. But in this method you also call another method which creates dictionary to count frequency. I think it is not a good idea to call that method inside "clean_up_list" method because that method supposed to clean up words, not creating dictionary. Therefore I don't think you should call that method inside that method. On the other hand, you are using "operator" class to count frequency, which is actually not complicated job. I was expecting to see some efficient algoritm to do that.
In any case, your videos are really simple and easy to understand. I am glad you are doing this. Thank you so much! 🙂
-
Hey, it worked but it's not arranged in the right order and there are still symbols everywhere
-
Love your tutorials. Fantastic. But the dark background makes it difficult to see some of the text that does not contrast drastically with the background. I realize this may be a late recommendation but thought you might like to know anyway. Keep up the great work.
-
Why the last line of the functions is like the name of the next function but without 'def' prefix?
-
@thenewboston Just a side recommendation: Putting static variable definitions into loops (such as in the case of var 'symbols') wastes processing power. On this small a scale it doesn't matter, but it may teach newbs bad habits. There are a number of other things that could be refactored here that would still make sense to teach as an intro. That aside, great video, keep up the good work.
-
Hi, how can I print only first 10 lines of word frequency sets in the output?
-
for anyone like me who was confused by how the "word_count" works, here is my explanation :
1:word_count = {} here is a empty dictionary
2:word_count[word] = 1 is equivalent of word_count.update({word:1}) # add "word" and set its value to 1 if it does not exist in the empty dictionary "word_count", which every "word" is not ,so python keeps adding word with value 1 to "word_count" dictionary until "if word in word count" then add 1 to the value of that "word"Hope this helps and let me know if anything i stated is incorrect.
-
made this in java and took me twice as many lines of code
-
Can anyone please tell me how word_count[word] +=1 is both increasing the word count and storing the words in word_count? Because we are not appending them in word_count
-
hmmm… any idea why my output is in unicode as such?
(u'to', 14)
(u'a', 14)
(u'the', 23) -
Working example as of 4/19/2017
import requests
from bs4 import BeautifulSoup
import operatordef all_words(url,):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code, "html.parser")
for before_text in soup.find_all("a", {"class": "result-title"}):
content = before_text.string
words = content.lower().split()
for each_word in words:
word_list.append(each_word)
clean_up_list(word_list)def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "+"*%&/()=?_-:.;,£$![]}{¦|@#¬~´"
for i in range(0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
clean_word_list.append(word)
create_dictionary(clean_word_list)def create_dictionary(clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value)all_words("https://cnj.craigslist.org/search/sys")
-
it is amazing what a few lines of python can do.. wow
-
To split the words by space and other special characters:
import re
special_chars_reg_ex = '[!. (),-/:]' # Change this to include any special characters
words = re.split(special_chars_reg_ex, content.lower()) -
dude you are amazing
-
I have got this error
—>> "for key, value in sorted(word_count.items(),key=operator.itemgetter(1)):
^
IndentationError: unindent does not match any outer indentation level" -
how come word_count itself printing the word and the no. of times it occured. as it only saves frequency of word
-
Your videos are so helpful, people should like them more!
-
trying to find the code for this section in forums can you please link it!
-
does anyone know if this code translates well with a text file inputed into python?
-
AttributeError: 'Response' object has no attribute 'txt'
i m getting this error,every time i do this typ of programm
-
another way to create a dictionary that contains the words and their count:
word_count = {}
for word in clean_word_list:
if word not in word_count:
word_count[word] = clean_word_list.count(word) -
import requests
from bs4 import BeautifulSoup
import operatordef start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code, "html.parser")
for headline_text in soup.findAll('a', {'class': 'hdrlnk'}):
content = headline_text.string
words = content.lower().split()
for each_word in words:
word_list.append(each_word)
clean_up_list(word_list)def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "~!@#$%^&*()[]{}_+`-=,.<>/?;:'""
for i in range(0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
print(word)
clean_word_list.append(word)
create_dictionary(clean_word_list)def create_dictionary(clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value) -
TypeError: list indices must be integers, not str i get this error wht to do
-
bucky please try to give more time on explanation you know i am a noob
-
I have 2 questions. Hope somebody can help me by answering my questions
1. The functions are declared after you called it. Shouldn't the function be declared first, then it is callable? Why isn't there any error occurred?
2. The array is made inside the function, why is it accessible from other function?
Really want to know why. plx <3 -
I have a problem on this. when i tried the same logic to sort according to key, it doesn't sort . I am using Pycharm and python 3.4.
I also tried another example just to check this issue.
x = {'suraj': 2, 'apple': 4, 'zebra': 3, 'peacock': 1, 0: 0}
for key, value in sorted(x.items(), key=operator.itemgetter(0)):
print(key, value)
it says '"TypeError: unorderable types: str() < int()" -
Can anyone tell me if it matters in what order the functions are defined? In this above thing I think function a is defined first and in that function b is called, then function b is defined, and so on. Does the order matter in Python? Is there a conventional way to do it? And am I right in thinking that in Python the functions all had to be defined before start() could be called? Thanks
-
I should probably stop useing a in my sentances. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. Damit
-
you could import counter from collection module that way you wont need to create the create_dictionary function let the counter do the job instead. https://docs.python.org/2/library/collections.html
-
Mannn. . . . .No words to express. . . YOu are awesomeeee. . . .
-
Thank you sooooo much for these
-
Actually python is used for multiplying vectors and matrices and other machine learning/data analysis computations. It's a very powerful language for that regard.
Bucky your the man, keep it up boss.
-
Your python tutorial series gave me a good programming project idea 😀
-
Great tutorial series! This seems fine for most words but as the apostrophes have been taken out and the text is all lower case, words like I'll and ill will be counted as one word. How can this be fixed?
-
Bucky – I use these in a classroom. If you could go back to doing full HD/1080 that would help, as I project this on a screen as I heckle … er … make insightful comments.
Leave a Reply