




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Professor: Maier; Class: TOP: INTRO TO MULTIMEDIA NTWRK; Subject: Computer Science; University: Portland State University; Term: Unknown 1989;
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!
Models in IR
I have met with but one or two persons in the course of my life who understood the art of Walking, that is, of taking walks,—who had a genius, so to speak, for sauntering: which word is beautifully derived from “idle people who roved about the country, in the Middle Ages, and asked charity, under pretence of going à la Sainte Terre,” to the Holy Land, till the children exclaimed, “There goes a Sainte-Terrer,” a Saunterer, a Holy-Lander. They who never go to the Holy Land in their walks, as they pretend, are indeed mere idlers and vagabonds; but they who do go there are saunterers in the good sense, such as I mean. Some, however, would derive the word form sans terre, without land or a home, which, therefore, in the good sense, will mean, having no particular home, but equally at home everywhere. For this is the secret of successful sauntering. He who sits still in a house all the time may be the greatest vagrant of all; but the saunterer, in the good sense, is no more vagrant than the meandering river, which is all the while sedulously seeking the shortest course to the sea. But I prefer the first, which, indeed, is the most probable derivation. For every walk is a sort of crusade, preached by some Peter the Hermit in us, to go forth and reconquer this Holy Land from the hands of the Infidels.
- from an essay by Henry David Thoreau What is this essay about? Justify your answer.
Taxonomy of IR models
Note: See chapter 2 of Baeza-Yates text for more complete treatment of definitions and formalisms
Formal characterization of IR models
An IR model is a quadruple where:
[ D , Q , F , R ( qi , dj )] D
F
Basic concepts
Weighted index terms
( k (^) i , dj )
k (^) i dj wi , j ≥ 0 k (^) i dj dj d (^) j =( w 1 , j , w 2 , j ,..., wt , j ) d j
Boolean model
Documents with index term k (^1)
Documents with index term k (^2)
Query: k 1 AND k (^2)
Documents Retrieved
Boolean model
Documents with index term k (^1)
Documents with index term k (^2)
Query: k 1 OR k (^2)
Documents Retrieved
Documents with index term k (^1)
Documents with index term k (^2)
Documents Retrieved
Query: k 1 NOT k 2
Boolean model
1 if any of the conjunctive components of the query is satisfied* 0 otherwise
Note similarity to data retrieval and DB query language
Boolean model
We had so much fun at the Kohler factory that Kaye suggested we check out the GM plant in Janesville. It's a huge plant, 3.5 million square feet, with 3 assembly lines. Two of them make trucks and Bluebird bus frames, but the line we saw makes Chevy Suburbans and similar light trucks, at the rate of one every 67 seconds.
A few overall comments on the Suburban line. Janesville is an assembly plant, so all the parts are made elsewhere, and come to the plant by truck and rail.
The Janesville facility was built by GM in 1919 as the Sampson Tractor Plant, and started making trucks as well the next year. In 1922 they started making Chevrolet passenger cars there.
There is very little inventory of parts on site. Basically, enough parts for one shift arrive at one time by train or truck.
Doc 1:
Doc 2:
Doc 3:
Doc 4:
Query Doc 1 Doc 2 Doc 3 Doc 4 janesville AND parts frames OR parts (truck OR trucks) NOT cars (plant NOT parts) OR (truck AND train)
Vector space model
d (^) j q
q is the same for all docs; does not affect ranking
d (^) j allows normalization for length of the document sim(dj , q) ranges from 0 to +
= =
=
t i iq
t i ij
iq
t i ij
1
2 1 ,
2 ,
1 , ,
j
j j
cosine coefficient for similarity can be used with either binary or real-valued term weights
Query: Chevy assembly Janesville
Doc Chevy assembly occurs in Janesville at the Chevy factory.
Doc Assembly of cars in Janesville is interesting.
Doc Factory assembly of Chevy cars is interesting.
Term TF 1 TF 2 TF 3 DF log(N/n (^) i) wi,d1 wi,d2 wi,d3 wi,q
occurs janesville factory cars
chevy assembly
interesting Doc Similarity to query: 1 2 3
Raw term freq (^) IDF
Probabilistic approach
Probabilistic model
A little history: Probabilistic model
A little history: Probabilistic model
Probabilistic model
Probabilistic model
( )
( | ) ( ) PB
P AB = PA ∩ B
P ( A | B ) P ( B )= P ( A ∩ B )= P ( B | A ) P ( A )
Probabilistic model
∑ (^) =
= i i i
i i PA a L
PA a L ( | )
log ( |)
= i i i i
i i i i i
i i i
i i PA aLPA L
PA aLPA L PA L
PA L PA aL
PA aL ( |)( 0 | )
) log( |)(^0 |) ( 0 ))
log(^0 |) ( |)
(log( |)
∑ (^) i W^ ( Ai = ai )
( |)( 0 |) log ( |)(^0 )|) PAaLPA L
PA aLPA L i i i
i i i = =
= =
Probabilistic model
( 1 )
( 1 ) log i i
i i i (^) q p
p q w −
∑ (^) = =
= = i i i i
i i i PA a LPA L
PA a LPA L ( | )( 0 | )
log( | )(^0 |)
Probabilistic model
R – r N – n – R + r N – n
Does not contain term
Contains term r n – r n
Relevant Non-relevant
Contingency table
R N – R N
Does notcontain term R – r N – n – R + r N – n
Contains term r n – r n
Relevant Non-relevant
Probabilistic model
R
r p = (^) N R
n r q −
( )( )
( ) log R rn r
rN n R r − −
( 1 )
log (^1 ) i i
i i i (^) q p w p q −
= −
Probabilistic model
log N
Probabilistic model
i i i i
i i i
Probabilistic model
Probabilistic model