URL indexing is implemented by indexing the URL checksums. A URL index database, which has two indexing algorithm, Hash and B+ tree, is built based on Berkeley DB. It satisfies the needs of the parallel crawler. DNS caching is a client cache method.
英
美
- URL索引采用了索引散列值的方法,基于Berkeley DB实现了Hash和B+树两种URL索引库,满足了爬行器快速查找URL的需要,为其正常运行提供了保障。