View Full Version : duplicate content detection is not correct
maryjili
09-16-2008, 05:31 PM
I checked all crc found and their contents are just similar but not at all identical. Is this what duplicate content detection meant to be?
maryjili
09-16-2008, 05:48 PM
The skip files counter inside the indexing status box is always over counting. When there is 25 skipped files in the indexlog.txt, it saids 56. When there is no skipped files, it always above 0 and will keep growing as the indexing proceed.
wrensoft
09-16-2008, 08:45 PM
The CRC option is for detecting and removing pages that have identical content but different URLs. (Not pages which might just be similar).
Regarding the skip page count. Turn on verbose mode, so you get a full log, before assuming the counter is wrong. There might be files skipped that you are not aware of.
maryjili
09-17-2008, 12:22 AM
Where is the verbose mode located? Do you mean the debug mode?
It is a button on the main index window. Alongside "Start indexing", "Configure", "Exit", there is a button that says "Verbose is off" (when Verbose mode is off) and "Verbose is on" (when Verbose mode is on).
Powered by vBulletin® Version 4.1.12 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.