Spidering Hacks

Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content.
When the Web began, it was a pretty small place. It didn't take much to keep abreast of new sites, and with subject indexes like the fledgling Yahoo! and NCSA's "What's New" page, you could actually give keeping up with newly added pages the old college try.
Now, even the biggest search engines—yes, even Google—admit they don't index the entire Web. It's simply not possible. At the same time, the Web is more compelling than ever. More information is being put online at a faster clip—be it up-to-the-minute data or large collections of old materials finding an online home. The Web is more browsable, more searchable, and more useful than it ever was when it was still small. That said, we, its users, can only go so fast when searching, processing, and taking in information.
Thankfully, spidering allows us to bring a bit of sanity to the wealth of information available. Spidering is the process of automating the grabbing and sifting of information on the Web, saving us the trouble of having to browse it all manually. Spiders range in complexity from the simplest script to grab the latest weather information from a web page, to the armies of complex spiders working in concert with one another, searching, cataloging, and indexing the Web's more than three billion resources for a search engine like Google.
This book teaches you the methodologies and algorithms behind spiders and the variety of ways that spiders can be used. Hopefully, it will inspire you to come up with some useful spiders of your own.
TABLE OF CONTENT:
Chapter 1 - Walking Softly
Chapter 2 - Assembling a Toolbox
Chapter 3 - Collecting Media Files
Chapter 4 - Gleaning Data from Databases
Chapter 5 - Maintaining Your Collections
Chapter 6 - Giving Back to the World
Password:ganelon
Random Posts
- Create Your Own Home Networks
- Servo Magazine Nov 2006
- Linux for Programmers and Users
- Shell Scripting Recipes - Apress
- OFDM for wireless communications systems - Artech
- FileMaker Pro 8 Bible
- Carrier Grade Voice Over IP - Second Edition
- Perl Progamming for Biologists
- Essential Windows Communication Foundation
- Big Book of Windows Hacks
















