xela: Photo of me (Default)
xela ([personal profile] xela) wrote2008-03-25 07:21 pm
Entry tags:

older LJ entries?

I was just looking for an LJ entry about something I did years ago. I happen to have looked at my "profile?mode=full" recently (and boy does it need updating), so I knew I had fewer than 500 total entries. So I typed "http://yakshaver.livejournal.com/?skip=500" into the location bar. And the earliest entry on that page was from roughly two years after I started using LJ. WTF? With a little experimentation I determined that that is in fact the page I get for skip=n for any value of n >= 380. So I can only look at my most recent 399 LJ entries that way. Apparently the only way for me to see older entries in my own journal is with the brain-damaged yyyy/mm/dd/ URL form.

Does this strike anyone else as arbitrary, pointless, and stupid?

[identity profile] zkzkz.livejournal.com 2008-03-26 12:26 am (UTC)(link)
Think about the database query that LJ has to perform to fetch these records. LJ has to do a query which actually has to fetch all those entries, then throw out the first 500 entries and start returning values. If you had some hypothetical high-traffic journal with thousands of entries this could be an intense query which would take several seconds to run. Queries which take several seconds are evil for web sites.

LJ has two options. Either optimize the query or ban large offsets.

Offhand the only solution I see is to label each journal with an "index". Then you could index the "index" column and jump straight to the right offset. However it means inserting a new article in the middle would require renumbering all the existing "index" values. If you assigned these "index" values only for articles after the 400th article then you could reasonably ban inserting articles older than the 400th oldest article, or something like that.

That's a lot of extra code to handle offsets over 400. Is it worth that much extra code? Plus the performance hit of having another index to maintain on every journal update? And the expense of that extra storage space?

[identity profile] yakshaver.livejournal.com 2008-03-26 01:02 am (UTC)(link)
You're the database guy, not me, but I don't see how you need an index to each article; each article already has a unique ID. All you need is a table that has three columns: index, unique ID, and date+timestamp. Whenever a new article is created, you go down the index looking for the first record whose timestamp is before the timestemp of the new article — almost invariably index 1. Insert the new record at that index and increment the rest. If incrementing all those records would suck (I suspect it would, but again, not my area of expertise), just keep the index in chronological order — i.e. if the index to my most recent entry is 433, the index to the new one (unless I've dated it in the past) is 434. Displaying my 80th through 99th most recent entries just requires a little arithmetic on the largest index.

[identity profile] awfief.livejournal.com 2008-03-26 02:06 am (UTC)(link)
The problem is that "the previous 20 entries" is *not* a constant -- it depends on who is looking at your journal. If you have 2 private entries in the previous 20 ones, then when I look at the previous 20, I'd get 18, if it were indexed in the way you are proposing.

[livejournal.com profile] zkzkz is correct -- basically to get entries 400-420 ago, it will get all 420 of the previous entries (that that particular user can see). Then it takes the earliest 20, and throws out the other 400.

I suspect another issue is that the code to get the previous entries in *your* journal is similar or the same to get the previous entries in a community, on a friends list, on a friendsfriends list, etc. and that gets even more search intensive.

Given that you can look at your entry subject lines by month, I suspect most people use that to quickly scan through posts. Then again, it wouldn't be too hard to be able to make a page with all the entries from a particular month.

[livejournal.com profile] zkzkz, what database are you a guru of, if I may ask?

[identity profile] zkzkz.livejournal.com 2008-03-26 09:43 am (UTC)(link)
PostgreSQL. I used to be an Oracle guy but I got better.


Incidentally if I had been implementing it I would not have used LIMIT at all. I would have had "Next"/"Previous" pass along the unique id of the last/first article on the page. Then the next page would start displaying articles starting from that time.

This would fix the bug I've sure you've seen where clicking next/previous sometimes brings up a page which overlaps the current page because new articles have appeared in the meantime.

Some finesse would be required to have this deal with inserting articles in the middle but it could be made to work. It would also let you go arbitrarily far back efficiently. What it wouldn't handle is your desire to jump to random places.

[identity profile] alierak.livejournal.com 2008-03-26 03:50 am (UTC)(link)
That would be the case when viewing your friends page, or friendsfriends page, but not when viewing an individual journal. That's implemented in a separate function which uses an index and a "LIMIT $skip,$itemshow" clause in the query. The arbitrary part is that $skip is limited to $maxskip, which is in turn limited by $LJ::MAX_SCROLLBACK_LASTN (a global config param, commented as "temporary" where it is used).

The general description of the config param is: "The recent items (lastn view)'s max scrollback depth. That is, how far you can skip back with the ?skip= URL argument. Defaults to 100. After that, the 'previous' links go to day views, which are stable URLs. ?skip= URLs aren't stable, and there are inefficiencies making this value too large, so you're advised to not go too far above the default of 100."

[identity profile] nuclearpolymer.livejournal.com 2008-03-26 01:07 am (UTC)(link)
I noticed another annoying thing, which is that after some point, the functionality for getting the previous 20 entries of someone's stuff stops appearing. Possibly at the same numerical point. And, this problem appeared sometime within the last month and a half, because it worked before.

[identity profile] alierak.livejournal.com 2008-03-26 04:05 am (UTC)(link)
The navigation links in your journal style refer to that view as "Recent Entries", and provide a separate "Archive" view. Go figure.

Another way to see older entries is to use an archiving client instead of the web interface, i.e., one that uses the LJ API to walk through all your entries and copy them to somewhere more greppable.