Why do I get java.io.FileNotFoundException…(Too many open files) or java.io.IOException…(Too many open files)?
On linux, a usual upper bound is 1024 file descriptors per process. To change this upper bound, there’s a couple of things you can do. If running the crawler as non-root (recommended), you can configure limits in /etc/security/limits.conf. For example you can setup open files limit for all users in webcrawler group as: # Each line describes a limit for a user in the form: # # domain type item value # @webcrawler hard nofile 32768 Otherwise, running as root (You need to be root to up ulimits), you can do the following: # (ulimit -n 4096; JAVA_OPTS=-Xmx320 bin/heritrix -p 9876) to up the ulimit for the heritrix process only. Below is a rough accounting of FDs used in heritrix 1.0.x. In Heritrix, the number of concurrent threads is configurable. The default frontier implementation allocates a thread per server. Per server, the frontier keeps a disk-backed queue. Disk-backed queues maintain three backing files with ‘.qin’, ‘.qout’, and ‘.top’ suffixes (One to read from while the other is b
Related Questions
- Why do I get java.io.FileNotFoundException...(Too many open files) or java.io.IOException...(Too many open files)?
- Files with the file extension ".pst" appear, as I connect the IC recorder to the computer. What are PST files?
- Can the webserver log files record the client browsers encryption capabilities?