Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Newer CPU's with many cores

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newer CPU's with many cores

    In the "huge" indexing instructions (My site has 4 million PDF pages) it says "Fast CPU Dual or quad core preferred. But more than 4 cores doesn't add much benefit."

    I am considering upgrading my base machine to an Intel i9 6900 or an AMD Threadripper 1950, which have 10 and 16 cores and 20 and 32 threads respectively. My current setup with an i7 6900 and Samsung M.2 NVMe drives takes about 40 hours to do a full index.

    Does v.7 make full use of high core count CPU's? Or is there a possibility of greater multi-core and thread processing being considered?

    The Intel has better single core speed, but the AMD gives 60% more cores for the same money. For the present and immediate future, what is Wrensoft's advice?

  • #2
    If indexing in spider mode, then the speed limiting factor is usually the web server and internet bandwidth. So throwing 16 CPU cores at the problem doesn't help at all. The internet latency means the CPU is idle most of the time.

    If indexing in offline mode, then the speed limiting factor is usually the hard disk speed. But this is changing with the new M.2 SSDs, putting more of the load back into the CPU. I can't imagine a situation where the software can ever use 16+ cores however. Disk systems just won't be this quick in the near future (the exception be low latency pure RAM drives).

    We do have a plan to add more threading into offline mode, but we still believe that 4 fast cores is (and will be) much better than 16 slow cores.

    As a general comment, very few software applications can use 16 cores in typical use.





    Comment


    • #3
      The PDF creation / OCR software I use, CVision's PDF Compressor, uses all available cores and I currently can watch the Task Manager graphs showing all 16 cores/threads in use; it can use all cores available well beyond my i7 6900 and the speed increase is very, very noticable. When I put in dual Samsung M.2 ssds in RAID, the speed improved radically.

      Overclocked, the Ryzen 1950 can do 4.3 in boost mode, and the alternative i9 7900 can do about 4.7 mHz without raising the voltage to unsafe heating levels. And that is why I wondered it Zoom would be able to make greater use of cores, as even the less expensive Ryzen and i8 chips are at the 8 core / 16 thread level.

      I index offline, and in the past, the faster the machine, the faster the processing. As you allude to, the trend now is to adding cores, with speeds staying relatively close to the last several generations. This is particularly true since Intel seems to have lost the predictable Tick / Tock refresh cycle on new generations. So I think, with no big leaps in core speed, it would be super to have Zoom take advantage of more cores.

      Comment


      • #4
        OCR is a different task compared to extracting the text content that is already in the PDF.
        Clearly Zoom could be better threaded than it is at the moment (when used with fast M.2.drives). But I don't think the text extraction, and adding the text to the index will ever use a large number of cores, as disk speed and record locking in the index will be the problem.
        (Creating multiple indexes at the same time from different documents sets would however use a lot of cores)

        Comment


        • #5
          True on the multiple index operations. I have benchmarked doing up to five simultaneous index creations on an 8 core i7 6900 with no degradation in speed of indexing. However, doing over 4 makes any other task on the same computer painfully slow. For many of my different reindexing jobs, I start them at the end of the day and find most complete the next morning, uploaded and ready for verification.

          Future improvements in threading will be a great gift for those with larger document counts.

          Comment

          Working...
          X