Foundations of Information Retrieval

The lecture gives an introduction to Web Information Retrieval with particular emphasis on the algorithms and technologies used in the modern search engines.

Content

The module covers an introduction to the traditional text IR, including Boolean retrieval, vector space model as well as tolerant retrieval. Afterwards, the technical basics of Web IR are discussed, starting with the Web size estimation and duplicate detection followed by the link analysis and crawling. This leads on to the study of the modern search engine evaluation methods and various test collections. Finally, applications of classification and clustering in the IR domain are discussed. The theoretical basis is illustrated by the examples of the modern search systems, such as Google, Altavista, Clusty, etc.

Die Lehrveranstaltung behandelt Algorithmen, Strukturen und innovative Systeme, die im Rahmen des World Wide Web relevant sind bzw. durch das World Wide Web möglich geworden sind. Kernpunkte der Lehrveranstaltung sind Web-Suche (Web Crawling, Text Indexing, Ranking Mechanismen), Analyse und Struktur des World Wide Web, Datenmanagement (Suche, Topologien, Systeme), sowie weitere aktuelle Themen.

Recommended Literature

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. It is available online here: nlp.stanford.edu/IR-book/

Participants

Computer science students (recommended from the 3. semester) and ITIS students.

Lecture and Exercise dates

Lectures take place Tuesdays, 14:15 – 15:45 in room 3703-023.
Tutorial session will take place Thursdays, 16:30 – 18:00 in room 1101-F142.

Please refer to Stud.IP for more information

Exam

The exam will be in English. You can answer in English. All topics discussed in the lectures, exercises, and programming exercises are relevant.

Duration: 120 minutes.
Auxiliary material: a non-programmable calculator, dictionary.

Lecture notes

We mainly use the book "Introduction to Information Retrieval" by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, which is available online and as PDF here.

Lectures and Dates

April 12, 2022    Boolean retrieval

April 19, 2022    Document ingestion, Dictionary and Tolerant Retrieval

April 26, 2022    Dictionary and tolerant retrieval, Indexing, Index Compression

May 3, 2022      Index compression, Scoring, Term weighting, Vector space model

May 10, 2022    Evaluation

May 19, 2022    Query expansion

May 26, 2022    Query expansion (continued), Probabilistic information retrieval

May 31, 2022    Language models for IR

June 14, 2022   Text classification and Naive Bayes

June 21, 2022   Vector space classification

June 28, 2022   Learning to rank

July 5, 2022      Flat and Hierarchical calustering

July 12, 2022    Link Analysis

Exercises

Exercises and their solutions are published via Stud.IP.