xela: Photo of me (Default)
[personal profile] xela
I would like to make a local archive of my LiveJournal, including the comments on all entries.

wget -U "Mozilla/4.0 (compatible; MSIE 5.22; Mac_PowerPC)" --no-parent http://yakshaver:<password>@yakshaver.livejournal.com/<number>.html

works fine for public entries — but doesn't work for locked entries. Does anyone know of an approach that would work to fetch all my entries?

Date: 2008-02-16 07:49 pm (UTC)
kareila: Seraphim uses her laptop. (laptopangel)
From: [personal profile] kareila
I will email you a perl script I wrote to handle this. I haven't used it in 5 years, though, so it might not be ready out of the box.

Date: 2008-02-16 08:05 pm (UTC)
siderea: (Default)
From: [personal profile] siderea
From the man page:
--load-cookies file
Load cookies from file before the first HTTP retrieval. file is a textual file in the format originally used by Netscape’s cookies.txt file. You will typically use this option when mirroring sites that require that you be logged in to access some or all of their con- tent. The login process typically works by the web server issuing an HTTP cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so proves your identity. Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by --load-cookies---simply point Wget to the location of the cookies.txt file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations: [examples]

Other browsers.
If you are using a different browser to create your cookies, --load-cookies will only work if you can locate or produce a cookie file in the Netscape format that Wget expects.

If you cannot use --load-cookies, there might still be an alternative. If your browser supports a ‘‘cookie manager’’, you can use it to view the cookies used when accessing the site you’re mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the ‘‘official’’ cookie support:
wget --no-cookies --header "Cookie: <name>=<value>"

Have you tried that?
Edited Date: 2008-02-16 08:06 pm (UTC)

Date: 2008-02-16 08:09 pm (UTC)
siderea: (Default)
From: [personal profile] siderea
Also see the usage of --post-data and --post-file, which describes "This example shows how to log to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users"

Date: 2008-02-16 10:48 pm (UTC)
From: [identity profile] alierak.livejournal.com
Try appending ?auth=digest to every URL. Normally LJ ignores HTTP auth and requires a steady diet of cookies, but some parts of the site have optional digest auth support.

Date: 2008-02-18 03:37 pm (UTC)
From: [identity profile] awfief.livejournal.com
[livejournal.com profile] ljdownload might work for you -- if not, you can probably hack it to make it work. I wrote it before S2 style sheets came in, and it totally broke things. It worked for my journal, and gets comments too. And strips out lj formatting stuff, but not the important formatting....

crap, except it looks like the URL is bad.

It's here now:

http://technocation.org/files/software/ljdownload/

Profile

xela: Photo of me (Default)
xela

November 2022

S M T W T F S
  12345
6789101112
13141516171819
202122 23242526
27282930   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 28th, 2026 01:43 am
Powered by Dreamwidth Studios