Python Combine Two Dictionaries To Nested Dictionary (text Similarity)

- 1 answer

I have the following documents:

documents = ["Human machine interface for lab abc computer applications",
              "A survey of user opinion of computer system response time",
              "The EPS user interface management system",
              "System and human system engineering testing of EPS",
              "Relation of user perceived response time to error measurement",
              "The generation of random binary unordered trees",
              "The intersection graph of paths in trees",
              "Graph minors IV Widths of trees and well quasi ordering",
             "Graph minors A survey"]

From which I build a wordmatrix:

wordmatrix = []
wordmatrix = [sentences.split(" ") for sentences in documents]

With the output:

[['Human', 'machine', 'interface', 'for', 'lab', 'abc', 'computer', 'applications'], ['A', 'survey', 'of', 'user', 'opinion', 'of', 'computer', 'system', 'response', 'time'], ['The', 'EPS', 'user', 'interface', 'management', 'system'], ['System', 'and', 'human', 'system', 'engineering', 'testing', 'of', 'EPS'], ['Relation', 'of', 'user', 'perceived', 'response', 'time', 'to', 'error', 'measurement'], ['The', 'generation', 'of', 'random', 'binary', 'unordered', 'trees'], ['The', 'intersection', 'graph', 'of', 'paths', 'in', 'trees'], ['Graph', 'minors', 'IV', 'Widths', 'of', 'trees', 'and', 'well', 'quasi', 'ordering'], ['Graph', 'minors', 'A', 'survey']]

Next, I want to create a dictionary, with a key for each document, and the word as a key and as a value the number how often the word appears in the document.

But I come only so far:

Initialize dictionaries

dic1 = {}
dic2 = {}
d = {}

With the first dictionary giving each document a key:

dic1 = dict(enumerate(sentence for sentence in wordmatrix))

with the output:

{0: ['Human', 'machine', 'interface', 'for', 'lab', 'abc', 'computer', 'applications'], 1: ['A', 'survey', 'of', 'user', 'opinion', 'of', 'computer', 'system', 'response', 'time'], 2: ['The', 'EPS', 'user', 'interface', 'management', 'system'], 3: ['System', 'and', 'human', 'system', 'engineering', 'testing', 'of', 'EPS'], 4: ['Relation', 'of', 'user', 'perceived', 'response', 'time', 'to', 'error', 'measurement'], 5: ['The', 'generation', 'of', 'random', 'binary', 'unordered', 'trees'], 6: ['The', 'intersection', 'graph', 'of', 'paths', 'in', 'trees'], 7: ['Graph', 'minors', 'IV', 'Widths', 'of', 'trees', 'and', 'well', 'quasi', 'ordering'], 8: ['Graph', 'minors', 'A', 'survey']}

And the second dictionary, making each word to a key:

for sentence in wordmatrix:
    for word in sentence:
        dic2[word] = dic2.get(word, 0) + 1

With the output:

{'Human': 1, 'machine': 1, 'interface': 2, 'for': 1, 'lab': 1, 'abc': 1, 'computer': 2, 'applications': 1, 'A': 2, 'survey': 2, 'of': 7, 'user': 3, 'opinion': 1, 'system': 3, 'response': 2, 'time': 2, 'The': 3, 'EPS': 2, 'management': 1, 'System': 1, 'and': 2, 'human': 1, 'engineering': 1, 'testing': 1, 'Relation': 1, 'perceived': 1, 'to': 1, 'error': 1, 'measurement': 1, 'generation': 1, 'random': 1, 'binary': 1, 'unordered': 1, 'trees': 3, 'intersection': 1, 'graph': 1, 'paths': 1, 'in': 1, 'Graph': 2, 'minors': 2, 'IV': 1, 'Widths': 1, 'well': 1, 'quasi': 1, 'ordering': 1}

However, I would like to combine both dictionaries in one dictionary, which should look like this: {0: {'Human':1, 'machine':1, 'interface':2, ....}, 1: (and so on)}




You don't have to combine two dict, only if you have dic2, you can build a new dict with dic2.

for line_num, sentence in enumerate(wordmatrix):
    dic1[line_num] = {}
    for word in sentence:
        dic1[line_num][word] = dic2[word]