FindMyShoes

This is SPb AU Information Retrieval course project. We aim to build a search system that can retrieve highly-relevant e-shop pages. In particular, we focus on shoes in russian e-market.

There are several caveats when choosing the right shoe model for your feet. Here are several parameters to be aware of (some of them in Russian, because I don't know how to put them in English):

size (stated size can be different from "real feelings", there is even a word in Russian for it - "маломерка");
подкладка;
стелька;
высота каблука.

Our information retrieval system will reduce time needed to find the right offer.

Project structure

src folder contains source code factored by purpose:

crawler subfolder contains our crawling scripts. We traverse the Web in breadth-first search manner, starting from some "seed" pages (usually, main pages of e-shops). This gives us data to search from;
indexing subfolder contains scripts for building an index. Index exists for the purpose of performing search queries efficently. In particular, we implement inverted index: for each word that occurs in overall dataset, we write a list of documents where this word appears;

data folder contains data that we search from, in different forms:

raw: raw HTML-pages and metainformation gathered by our crawler. You can check sample data in our repository;
json: "interesting" information extracted from raw data, like shoe sizes and names. Has loose schema;
index: search-ready inverted index in binary form.

You can obtain json and index data by using scripts in indexing folder.

Authors

Mike Koltsov
Andrey Kravtsun

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
data		data
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FindMyShoes

Project structure

Authors

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

ItsLastDay/FindMyShoes

Folders and files

Latest commit

History

Repository files navigation

FindMyShoes

Project structure

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages