Course Description

Web Information Retrieval (WIR) is the process of extracting useful information from the vast amount of data available on the World Wide Web. It encompasses a wide range of techniques and technologies to effectively gather, process, and analyze the information. This course "Web Information Retrieval" is designed to provide a comprehensive introduction to the concepts and techniques used in WIR.

Throughout the course, you will learn the basics of web information retrieval, including the structure of web pages, web scraping, and natural language processing. We will start by learning the basics of web scraping and how to extract data from web pages using Python libraries such as BeautifulSoup and Scrapy. You will also learn about the core concepts of natural language processing, including text pre-processing, tokenization, and sentiment analysis.

As we move forward, we will delve into more advanced topics such as information retrieval algorithms, machine learning, and deep learning. You will learn how to use information retrieval algorithms such as vector space model, Okapi BM25, and Latent Semantic Analysis to retrieve relevant information from large corpora. You will also learn how to use machine learning and deep learning techniques to classify and extract information from unstructured data.

We will also cover the best practices for developing web applications, such as how to manage large projects and collaborate with other team members. You will also learn about web development tools such as NLTK, Gensim, and TensorFlow and how to use them to improve your development workflow.

Throughout the course, you will also learn about the latest trends and updates in WIR and how to use them in your applications. With the help of this course, you will have a solid understanding of web information retrieval and the skills to retrieve and extract useful information from the vast amount of data available on the World Wide Web.

Author: L. Becchetti, A. Vitaletti (University of Sapienza Rome)