okay, i've packaged up the stuff i used to automate the download of the sti manual. (btw, this is all for windows.)
http://archlab.gmu.edu/~rchong/STI/download-sti-manual.zip
this zip includes three files:
getit.bat is a windows batch file that you'll use to download all the files. you should peek in here just so you see what's going on.
subarucookie.txt is a cookie template you'll need.
curl-xxxx.zip is the program that does the downloading.
1. unzip the curl archive.
2. move getit.bat into the directory where curl.exe exists
3. move subarucookie.txt into the directory where curl.exe exists
4. register with the subaru site and pay your $20 for a 3day subscription.
5. got to one of the pages and try to download a pdf; e.g. go
here.
6. you should get a login page. go ahead and login.
now the "hard part".
you need a browser that will allow you to inspect your cookies. i use opera and it'll show me my cookies. if you're a linux/unix/osx geek, i've found that lynx works well also (once you've logged in, press ctrl-k to see the cookie jar.) you should see the cookie for techinfo.subaru.com. note the long alphanumeric string. that's the sessionID. copy and paste that into subarucookie.txt where indicated. (fellaz, please let's not share cookie files, okay? just pay the lousy $20 and do this legally. if you can't get your session ID, lemme know and we'll figure out something.)
now open up a command/dos window (start menu, run, "cmd"), cd to the curl directory, and run getit.bat. sit back and watch. it'll take a while; i've got a big pipe and i'm getting ~600 files/hour. if you want to be clever, you can break the getit.bat into four separate files and run all four scripts in parallel. this should at least triple the download speed, provided the server isn't limiting the number of connections from one client.
now, once the download is done, should use windows' search facility to fine .pdf files with the test string "<html>". the batch file simply tries to retrieve document numbered from 35094 to 36480. however, some of these documents don't exist. in this case, the server sends back an error page, not a pdf file. so by searching for files with "<html>", you'll be able to identify which files are not pdfs. just delete them. then that should be it.
one weakness of my approach is that a file's name is just its document number, not a descriptive name. i'm not convinced (yet) that this is a big deal since i plan to just use acrobat to combine all the files into one big pdf. if this is an issue to some, i'm certain there's an easy way to rename the files. i'll work on that if requested.
okay that's it. *hopefully* this is clear. these instructions will likely have to be revised. and let's hope that techino.subaru.com doesn't change things to spite us.
EDIT: this information has changed as the script has evolved with the help of forum members. visit the url provided and download the script. there is a README.TXT included that describes process.