Class: Arachnid2::Typhoeus
- Inherits:
-
Object
- Object
- Arachnid2::Typhoeus
- Includes:
- Exoskeleton, CachedResponses
- Defined in:
- lib/arachnid2/typhoeus.rb
Constant Summary
Constants included from CachedResponses
CachedResponses::CACHE_SERVICE_URL
Instance Method Summary collapse
- #crawl(opts = {}) ⇒ Object
-
#initialize(url) ⇒ Typhoeus
constructor
A new instance of Typhoeus.
Methods included from Exoskeleton
#bound_time, #bound_urls, #browser_type, #crawl_options, #extension_ignored?, #extract_hrefs, #in_docker?, #internal_link?, #make_absolute, #maximum_load_rate, #memory_danger?, #non_html_extensions, #preflight, #process, #proxy, #skip_link?, #timeout, #vacuum
Methods included from CachedResponses
#check_config, #load_data, #put_cached_data
Constructor Details
#initialize(url) ⇒ Typhoeus
Returns a new instance of Typhoeus.
6 7 8 9 10 |
# File 'lib/arachnid2/typhoeus.rb', line 6 def initialize(url) @url = url @domain = Adomain[@url] @cached_data = [] end |
Instance Method Details
#crawl(opts = {}) ⇒ Object
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# File 'lib/arachnid2/typhoeus.rb', line 12 def crawl(opts = {}) preflight(opts) typhoeus_preflight until @global_queue.empty? max_concurrency.times do q = @global_queue.shift break if time_to_stop? @global_visited.insert(q) found_in_cache = use_cache(q, opts, &Proc.new) return if found_in_cache request = ::Typhoeus::Request.new(q, ) requestable = after_request(request, &Proc.new) @hydra.queue(request) if requestable end # max_concurrency.times do @hydra.run end # until @global_queue.empty? ensure @cookie_file.close! if @cookie_file end |