PDA

View Full Version : How much memory does it take to index a large site?



ghk
04-23-2005, 07:53 PM
I've tried Zoom and like it but would like to as a question before buying.

I have a site that has 35,000 pages and grows by about 10,000 a year. Each page is small in terms of content (up to 500 words each). I'd like to know how much RAM is required to index such.

Also, can you give an indication of the size of the files generated for the 22,000,000 word Wikipedia search. This would be handy.

Thanks

wrensoft
04-25-2005, 12:42 AM
For 35,000 pages we would recommend using a machine with at least 300MB of RAM install for indexing. 512MB would be better.

The Wikipedia search can be found here,
http://www.wrensoft.com/cgi-bin/wikipedia/search.cgi
It searches 21,014 files, and 22,528,847 words

The size of the index files used for this are a total of 134MB. Which is not too bad when you consider that the source data was probably around 1GB+ in size.

-----
David

Anonymous
04-27-2005, 07:18 PM
what is required or needed in a larger scenaro as if some one wanted to make a search engine that indexed many different message board sites, some of which contain over 50000 posts?

thanks

wrensoft
04-27-2005, 10:38 PM
Message board typically have a lot of pages that you don't want indexed. (e.g. member list pages, profile pages, etc..). So you should filter which pages are indexed. See this previous post on this topic,
http://www.wrensoft.com/forum/viewtopic.php?t=165

If you do a good job filtering, 50,000 posts might in fact be only 10,000 HTML pages (if the message board displays 5 posts per page). So in this case you might not need a particularly powerful machine. (e.g. 256MB of RAM).

-----
David