Storing the topic models in a list also considering the maximum occurrences
up vote
0
down vote
favorite
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
add a comment |
up vote
0
down vote
favorite
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
I am performing topic modelling and using functions to get the top keywords in the topic models as shown below.
def getTopKWords(self, K):
results =
"""
returns top K discriminative words for topic t
ie words v for which p(v|t) is maximum
"""
index =
key_terms =
pseudocounts = np.copy(self.n_vt)
normalizer = np.sum(pseudocounts, (0))
pseudocounts /= normalizer[np.newaxis, :]
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
## Code for storing the values in a single list
return results
The above functions gives me the code as shown in the fig
0 ['computer', 'laptop', 'mac', 'use', 'bought', 'like', 'warranty', 'screen', 'way', 'just']
1 ['laptop', 'computer', 'use', 'just', 'like', 'time', 'great', 'windows', 'macbook', 'months']
2 ['computer', 'great', 'laptop', 'mac', 'buy', 'just', 'macbook', 'use', 'pro', 'windows']
3 ['laptop', 'computer', 'great', 'time', 'battery', 'use', 'apple', 'love', 'just', 'work']
It results from the 4 time the loop runs and print index and all keywords in each vocab.
Now, I want to return a single list from the function which returns me the following output.
return [keyword1, keyword2, keyword3, keyword4]
where, keyword1/2/3/4 are the words which were occuring the most in vocab lists with index 0, 1,2,3 in output.
python python-3.x
python python-3.x
edited Nov 10 at 18:49
asked Nov 10 at 18:21
Shivam Panchal
388
388
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
You can use collection.Counter:
from collections import Counter
a = ['computer', 'laptop', 'mac', 'use', 'bought', 'like',
'warranty', 'screen', 'way', 'just']
b = ['laptop', 'computer', 'use', 'just', 'like', 'time',
'great', 'windows', 'macbook', 'months']
c = ['computer', 'great', 'laptop', 'mac', 'buy', 'just',
'macbook', 'use', 'pro', 'windows']
d = ['laptop', 'computer', 'great', 'time', 'battery', 'use',
'apple', 'love', 'just', 'work']
def get_most_common(*kwargs):
"""Accepts iterables, feeds all into Counter and returns the Counter instance"""
c = Counter()
for k in kwargs:
c.update(k)
return c
# get the most common ones
mc = get_most_common(a,b,c,d).most_common()
# print top 4 keys
top4 = [k for k,v in mc[0:4]]
print (top4)
Output:
['computer', 'laptop', 'use', 'just']
some_results = # store stuff
for t in range(self.numTopics):
topWordIndices = pseudocounts[:, t].argsort()[-1:-(K+1):-1]
vocab = self.vectorizer.get_feature_names()
print (t, [vocab[i] for i in topWordIndices])
some_results.append( [vocab[i] for i in topWordIndices] )
mc = get_most_common(*some_results).most_common()
return [k for k,v in mc[0:4]]
edited Nov 10 at 19:51
answered Nov 10 at 18:37
Patrick Artner
18k51839
18k51839
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
1
@ShivamPanchal what? It is one fuction that you provide your lists -.most_common()
is explained in the documentation of Counter - read it.top4
is just list slicing of the(key,count)
tuples provided bymost_common()
. Your code above uses list slicing - so thats nothing new to you - is it?
– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
1
1
@ShivamPanchal what? It is one fuction that you provide your lists -
.most_common()
is explained in the documentation of Counter - read it. top4
is just list slicing of the (key,count)
tuples provided by most_common()
. Your code above uses list slicing - so thats nothing new to you - is it?– Patrick Artner
Nov 10 at 18:57
@ShivamPanchal what? It is one fuction that you provide your lists -
.most_common()
is explained in the documentation of Counter - read it. top4
is just list slicing of the (key,count)
tuples provided by most_common()
. Your code above uses list slicing - so thats nothing new to you - is it?– Patrick Artner
Nov 10 at 18:57
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
I am trying to use it in my code, but not working, can you add it in my code, It will be great
– Shivam Panchal
Nov 10 at 19:22
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
really sorry, but I got this. TypeError: unhashable type: 'slice'
– Shivam Panchal
Nov 10 at 19:51
1
1
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
@ShivamPanchal forgot a .most_common()
– Patrick Artner
Nov 10 at 19:51
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242049%2fstoring-the-topic-models-in-a-list-also-considering-the-maximum-occurrences%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown