Xweb 1.0 Archive update (almost!)

Status
Not open for further replies.

User1

RETIRED Admin, pm OFF
The Xweb-hosted location for the 1.0 archive is going to be: http://xwebarchives.org/1p0/

I have now uploaded all index and image files, AND am now uploading the massive volume of message contents themselves (200,000+ files in 150,000ish folders, so could be a while...)

The directory and link structure of the local archived post pages is established and visible in the link urls on all indexes (although google search indexing its content may then still take another week or two after upload is complete)

I'll update this thread when the post content upload process is done ...
 
Last edited:
upload sequence

I've had to suspend further uploading for the moment while I sort out yet another server issue... it's likely to take at least a couple/few days...

If you want an early peek at it in action (sort of, there are still going to be several bugs/error pages to chase down and fix, but I have to get it all on the server at this point to troubleshoot further) you should hit the page one bottom & jump back to between about index pages 255-335 or so then scope some post links out in that area.

The upload is going in numerical ascending order from message folder number 10000*** to 99999***. The 9s are mainly the very oldest posts, the 1s start several pages in from back, then upload goes forward in time toward front page from there, then will load in the earliest posts in the 9s last.

As of right now the server has from 10000*** to about 10873*** on it, the full sequence goes to about 12359*** then skips to the 9s... so as of now the archive overall is about 20% up (and on hold til I get this server config issue resolved)...
 
more bad news

Ok well as if this all hasn't been fun enough, I just figured out that linux server file systems cannot handle more than 32,000 folders in one directory without recompiling a hacked kernel (the odds of getting the server host to do that seem unlikely) which means, move the whole mess to a different host with a windows server (if that os can even handle it either) or start the rip over (yet again) from scratch with a different directory structure breaking the messages down into yet another tier of smaller subdirectories, which while technically "possible", I'm not sure how to make that fly as the ripper I'm using (the only one of several I've tried that even worked at all on the 1.0 setup) does not seem to have the capacity to modify the build in that specific manner.. In either case I likely will have to scrap it all and start over, again, so, sorry to say, we're not there yet... :sigh: "stay tuned"...
 
Hey Mac, how about offering up a CD

a CD of the archive. Think about it. Burn copies, make a few bucks in the process. Maybe apply some cash to yellow X in man cave. You think?? :2c:
 
thanks

Yes I had briefly mentioned this idea on the last thread discussion of these rip attempts... there will need to be many differences though in the coding of the resulting files to make it look/work right offline than on, and at least right now, it is far too big to even fit on a cd. My thought is to first get the online version up and functional, because in the process of this I anticipate shaking out a lot of the errors there will no doubt be, and maybe maybe later, after we have some functioning version online that works to at least a reasonable degree, then I'd take the mass file editor on my hdd and start overhauling copies of all these files to work offline as expected, although it still won't fit on a cd so maybe some other method, but yes, it's a good idea conceptually, although not really interested in doing it for profit, maybe free + ship + voluntary donation to the site expense kitty. I don't know. That is a whole other scenario and we need to get the online version to work first, but definitely a cool idea worth keeping in back of mind for down the road... :)
 
upon further research

...it is looking like the one ripper I've ever found that actually works with Xweb 1.0 producing something even remotely close to something we could use, is not capable of doing what I need it to do in terms of that kind of directory restructuring on the fly. Rather than just have the whole rip project stall out here, unless some new revelation comes forth on the ripper front, I am going to take the full archive that I was just attempting to upload and rework the *** outta it (!) on my hdd :pc: using a mass file editor to see whether I can apply global changes to all the files such that I can manually split the immense message directory into multiple folders... AND recode the thousands of links doing that is going to break :rolleyes: I'll let you all know if I make any headway... :mallet:
 
What are you using for a mass file editor?

Just curious. :hypno:
 
thanks

Me doing software/code is endless entertainment if nothing else! :pc: :dunce: :wacko: ;)
 
drumroll please!

(cue sound of trumpets heralding from a hilltop)

UltraEdit - http://www.ultraedit.com/

This thing is effin awesome :cool: I just found it a couple days ago (free trial version is full functionality for a month)

Its basically like notepad on massive steroids, it does a million things I dont even know and will never use, but for our purposes, it is capable of (so far) pretty consistent find/replace-in-file operations on a folder containing hundreds of thousands of files per go, among a few other useful items. At these file volumes it takes a lot of time to run so many operations, but it is working (so far), thats the part I care about! ;)

I've just dumped everything off the server again, broken the message archive on my hdd up into 5 folders, and am running a series of find/replace ops to the html files of posts to correct the resulting broken link structures throughout. Will it work? I don't know, but it's the best idea I have at the moment... :hmm:
 
another update

The mass edit operations appear to have been a success! The revised archive indexes are all up (this time with the new 5 folder message link structure) and the post contents uploads are getting underway once more... it's (again) about 20% uploaded - I'll post a "now open" thread up top when it's all finally there... PS, all the links that were disabled on the N54 site have been restored :)
 
Please do not run rippers against Xweb

Please do not run rippers against Xweb

Someone out there in internetland besides me appears to be -still- attempting to run a ripper against the old site. If so, please stop (this is my second request), I already got us a rip thats about as good as any we are ever likely to get, and we plan to offer offline copies of it here in the future for free (+ship +optional donation) anyway.

My several attempts it took to get a functioning rip drove the old site's bandwidth usage way up over the past weeks and if any more people keep whacking it with a ripper its going to run it over its service limit (which will force the site into ad-mode and it will immediately become popup/banner ad hell, which would corrupt any further rip attempts anyway, but also including my own...)

I am forced to lock the old system down from view for now in order to regulate the bandwidth to conserve what remains for my own "in case of" rip needs until our 'official' 1p0 rip is up and we've got some of the errors cleaned up etc

Just hang in there please folks, as soon as we have a workable archive online, I'll find some way to make offline copies available to interested users :)

Thanks and stay tuned
 
Son of Xweb's 1p0 archive "shakedown cruise" (soon)

Ok gang here is the latest on the build progress of local 1p0 archive...

At this point (knock on wood) there is nothing left between us and having the thing on the server but time. 'Message1' and 'message2' segments of the 5-folder archive sum are up, 'message3', 'message4' and 'message5' contents are each about half up (roughly) and as I type the remaining content is in a 3-way parallel upload operation til done...

newr.jpg


Slow going but not too far from it being all on the server. Once thats done, we're going to have a new opportunity for a group site development task, instead of what we were initially going to do (moving all those threads around on the son of db) what we now will be doing instead is a full-once-over of the completed 1p0 archive to find and fix all errored or skipped links in the rip (which will be the final phase of dev, after rip itself, mass edits, now uploads)...

What we will have at the end of upload phase is the full archive minus a few things: error pages that did not get saved correctly due to whatever parsing or download error therefore do not exist on the server or exist as an error page or exist in some garbled state, and, also some hyperlinks that did not get converted in rip translation due to having a "?" question mark in them, meaning a db query as opposed to a direct link. All query links in the rip got skipped deliberately, so as not to have the archive be interminably filled with duplicate query result pages etc, the downside is though, that now these links still point to original locations on the old forum as opposed to their mirrored locale on the 1p0 server.

It may sound like a mouthful or somewhat daunting, and it may in fact be a longer-term/ongoing project, but it should not be that hard. I cannot get a totally automated count, because some of what I did was manual efforts, but to my best estimate I'm thinking there are somewhere between 200-500 actual errored post files, and cannot say how many query '?' links. This is not a bad count though considering the total is about 200,000+ files and probably well over a million links and the overwhelming majority of both post contents and interlinks DID successfully translate...

So bottom line is I am going to start a sticky thread in here (we are not quite ready to begin this activity yet, but when we are) where we can all commence to easter-egg/post/report/track these 'bad' links in the archive, and thereby tremendously help me in repairing them as we go :)

[UPDATE: All uploads are now complete, 100% of what I have is online on server, I posted a new thread http://xwebforums.com/forum/index.php?threads/2942/]
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top