Spidering Hacks

Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You’ll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you’ve gone too far: what’s acceptable and unacceptable). Next, you’ll collect media files and data from databases. Then you’ll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content.
When the Web began, it was a pretty small place. It didn’t take much to keep abreast of new sites, and with subject indexes like the fledgling Yahoo! and NCSA’s “What’s New” page, you could actually give keeping up with newly added pages the old college try.
Now, even the biggest search enginesâ€â€Âyes, even Googleâ€â€Âadmit they don’t index the entire Web. It’s simply not possible. At the same time, the Web is more compelling than ever. More information is being put online at a faster clipâ€â€Âbe it up-to-the-minute data or large collections of old materials finding an online home. The Web is more browsable, more searchable, and more useful than it ever was when it was still small. That said, we, its users, can only go so fast when searching, processing, and taking in information.
Thankfully, spidering allows us to bring a bit of sanity to the wealth of information available. Spidering is the process of automating the grabbing and sifting of information on the Web, saving us the trouble of having to browse it all manually. Spiders range in complexity from the simplest script to grab the latest weather information from a web page, to the armies of complex spiders working in concert with one another, searching, cataloging, and indexing the Web’s more than three billion resources for a search engine like Google.
This book teaches you the methodologies and algorithms behind spiders and the variety of ways that spiders can be used. Hopefully, it will inspire you to come up with some useful spiders of your own.
TABLE OF CONTENT:
Chapter 1 - Walking Softly
Chapter 2 - Assembling a Toolbox
Chapter 3 - Collecting Media Files
Chapter 4 - Gleaning Data from Databases
Chapter 5 - Maintaining Your Collections
Chapter 6 - Giving Back to the World
password: warez
Random Posts
- Professional JavaScript for Web Developers
- Inventory Accounting:A Comprehensive Guide
- .NET for Java Developers: Migrating to C# - Addison Wesley
- Windows Vista Secrets
- Network Service Investment Guide - Maximizing ROI in Uncertain Times
- Running IPv6
- Data Mining: A Heuristic Approach
- C# .Net Illuminated (with source code)
- Basic English Usage
- Practical DV Filmmaking 2nd Edition

















