48 lines
1.0 KiB
Plaintext
48 lines
1.0 KiB
Plaintext
1 - "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow."
|
|
2 - "Professor Plumb has a green plant in his study."
|
|
3 - "Miss Scarlett watered Professor Plumb's green plant while he was away from his office last week."
|
|
|
|
l1 = 19
|
|
l2 = 9
|
|
l3 = 16
|
|
|
|
q1 - "green"
|
|
q1 = [0.0, 0.71]
|
|
|
|
1 = [0.0, 0.0747]
|
|
2 = [0.0, 0.1555]
|
|
3 = [0.0, 0.0875]
|
|
|
|
green : total count = 4, idf = 0.71
|
|
mr : total count = 2, idf = 1.40
|
|
the : total count = 2, idf = 1.40
|
|
plant : total count = 2, idf = 1.40
|
|
|
|
q2 = "Mr. Green"
|
|
q2 = [1.4, 0.71]
|
|
|
|
1 = [0.147, 0.0747]
|
|
2 = [0, 0.1555]
|
|
3 = [0, 0.0875]
|
|
|
|
q3 = "the green plant"
|
|
q3 = [0.5, 0.25, 0.5]
|
|
|
|
1 = [1, 0.5, 0]
|
|
2 = [0, 0.25, 0.5]
|
|
3 = [0, 0.25, 0.5]
|
|
|
|
Inverse Index as a trie
|
|
values are {docId: score} where score is the sum of tf across fields, with multipliers applied
|
|
when querying calculate the idf and multiply it by the tf
|
|
|
|
for a multi term query generate a vector using the idf
|
|
find all the documents that match both queries, and generate a tf*idf
|
|
|
|
word: {
|
|
totalCount: 123,
|
|
docs:
|
|
}
|
|
|
|
|