Eviscerati.org

Comic Transcripts Added to Site!

Comics are difficult to handle on the web because they contain a lot of information that can't be indexed by search engines. This can make trying to find a single comic out of a large archive a rather daunting experience.

Oh No Robot is a service that allows you to create comic transcriptions -- essentially the text "script" of an existing comic -- that can be indexed in place of the comic itself. When these transcripts are added to the comic's page, suddenly it's possible to search your comic archives!

I like what Oh No Robot did, but I wanted to be able to integrate it more fully into my site... and now it is.

It is now possible for me to easily add comic transcripts to my site. People with accounts on this site will notice that they, too, can add transcripts if they wish (in fact the tools to do so will appear automatically beneath each comic). Once a transcription has been added I can lock it down, to prevent it from being defaced at a later date.

If you see any problems with this tool, or you have any suggestions on how it could be improved, please let me know.

Tags:
Click on any of the above tags to view only the articles or comics that contain that category, storyline, or character.

Comments

You could make it smaller.

You could make it smaller. We're already reading the comic, no sense in having another chunk of text right below it. Search engines will still find it.

Agreed.

Eventualy it won't be displayed on the front page at all... but I need to modify a few things in order for that to happen, because when I take it away I *also* wind up removing the previous/next page links, which I want to keep.

Perhaps you could add the

Perhaps you could add the transcript to the metadata section.

The problem is...

... I want someone to be able to see what they've written after they update it, if they decide to add a transcript.

I think on individual comic pages, displaying the comic transcript shouldn't be too big a deal (heck, it would allow someone using a text browser like Lynx to actually read the comic, which is a bit frivolous but kind of neat at the same time). It's the front page where all the extra junk seems cluttered, in my opinion.

If I get too many opinions to the contrary, of course, I may have to revise that.

OCR

From what I can tell, this isn't an automated process... Is it possible to run each of the comics through OCR inside a shell script and map that to each comic page, perhaps as a flat-file indexed by either the comic date by image URL (or some other unique identifier - I'll get to the reason I picked URL in a minute though)?

I'm thinking something along the lines of pulling the comic images from the server with with wget (or, if you have them nicely sorted in a directory on the server, just use that), running each through an open-source OCR program such as GOCR/JOCR and piping the output to a text replacement for the image tag which includes the transcription as the alt attribute. I'm not entirely sure, but I think that would make it show up in search engines... if not, the transcription could be appended to the page or something.

Because it wouldn't be able to pull the character names from the images, this wouldn't provide a true script, but that's not really necessary for searching anyway.

This is an idea I've been toying with for a while for a couple of different webcomics I read - yours included - but I've never had the time to go anywhere with it. If you're interested in exploring it further feel free to hit my email address (I can't promise I'll remember to check this board later today).