Discover on OSMBlags II

Read will be the first tab finished on OSMBlags, but the next tab (and the next serious technical challenge) will be the Discover tab.

It isn’t enough to deliver content that people are already reading (that’s a bit self-limiting, isn’t it?), OSMBlags II has to help people find more and better content to read.  The concept is pretty simple: if you read one blog on cooking, you might want to read two.  If you read five blogs on cooking, you’ll probably want to read six.  People are predictable that way.

However, not all content is worth reading.  Most of the internet is just noise.  OSMBlags will have to keep a directory of popular sites to add to your feed.  Popular sites tend to be better but not always.  (Google search algorithms favor sites with lots of incoming links.  This is because Google assumes [usually rightly so] that most sites will link to the most trustworthy source.)  I don’t plan on going this route, but a carefully managed wiki style directory of readable content with an automated pruning bot may just do the trick.

But what if I’m interested in reading good writers with opinions on all sorts of things?  Some of my favorite blogs — even my own blog — cover a lot of different topics.  What if I want to read a post about travel but I don’t want to subscribe to a travel blog?  Is there a way to do this?  This is a lot trickier, but I may have a solution.  If we trust a blog we can use Tags and Keywords to identify the type of content.  Spammers are constantly abusing this.  We have to limit the scope somehow.  The easiest way is to create some kind of community moderation system where people rate content.  They can say, “Yes this is good” or “No this is spam.”  We could also rate content by its popularity.  We could even cross-reference different users to say things like, “People like you also liked…” Netflix is one service that does this.

We could do that.

But we won’t.  I’m concerned about a number of ethical issues.

  • I don’t want to track users’ reading habits that closely.  Users should be allowed an expectation of privacy.  Privacy is a rare find on the internet, but I can build a place where it is found.
  • Popularity driven sites favor the most established sites.  New authors, new blogs, and new ideas tend to get buried.  I want to give them a leg up.  Merit — not age — should drive content finding.
  • Sites that rate content tend to form a “hive-mind” where the content that is rated best is actually the content that most people agree with rather than the better content.  Dissenting voices, no matter how articulate, are silenced by the mob.

I won’t allow any of those things on my website.  Which is why I’m looking at heuristic content algorithms.  They can be fooled, but it’s a start.

But there are people who are already combing the internet looking for the best.  There are literally hundreds of content aggregators and online journals.  If OSMBlags can link into them somehow, our problem may have been solved by someone else’s diligence.

