As a group assignment (2 members) for the "Web Technologies" module on my Master Degree, we had to develop an online social web crawler. The main features of the program were the following:
- Retrieve the webpages
- Extract all the keywords
- Extract all the direct hyperlinks to other pages
- Retrieve all the directly linked pages
- Extract all the keywords contained in the directly linked pages
- Store the keywords and the URL into the inverted indexes in the database
- Retrieve tweets
- Extract the geolocation of the tweets
- Store the data in the database
- Use Google Maps to show the geolocation of all the tweets
The Social Web Crawler was developed using Java Servlets, HTML, Freemarker and MySQL. Due to extremely tight deadlines the software was developed in less than a week in which period we had to learn completely new technologies and implement the solution based on them.
A complete description of the project along with the guidelines can be found here.
Tomcat's final war file of the project can be found here.
The final report of the project can be downloaded from here.
All the files of the project can be found here.