Towards a better commenting system

What is a civil conversation?

Humans love to communicate opinions and ideas. These communications are a form of an advertisement that broadcast the nature of our personalities. Until the arrival of the internet, for most people (except maybe businesses) the primary medium for these advertisements used to be in person conversations, hand written letters or telephonic systems. Since the 1990s, the marketing tools have transformed and the most popular ones today include sites like Twitter, Facebook and Reddit. These online ecosystems provide us a novel way to encounter opinions and feelings associated with an individual or a group. A tweeter or a redditor sits (or stands) behind a screen and engages in conversations in ways they never would in person. And that’s why this is no harmless advertisement of ourselves. Our main motive isn’t always to maximize profit by being diplomatic or kind in expressing our opinions. It often requires (mandates even) that we use aggressive language, make snarky comments by using ad hominems and troll the other side. Why don’t we just politely argue online, you ask?

Two reasons come to mind:

Since this new medium provides a potential to stay anonymous, there are no consequences associated with the negative behavior. A competent programmer can potentially create hundreds of fake accounts and spam forums where she happens to disagree with the majority. And this strategy works extremely well. On the other hand, low costs associated with trolling can significantly increase the outrage value of otherwise trivial opinions by minutely famous people. This can occassionaly have good effects in terms of raising awareness but usually wreaks havoc. (Check this poor lady’s story and this book for more such depressing stories).
Sophistication is boring and not catchy enough to spread like wildfire. Go to Youtube or Facebook and the most liked comments on a post will usually be shallow one liners that are either overly congratulatory or passionately angry (depending on the mood of the majority towards that video or article). Sober and lengthy comments aren’t sexy enough and they require more work to process and understand our feelings towards them. They are not a good replicating “virus” because the content of its DNA is ineffective when it comes to grabbing host’s attention span. See this video to understand this phenomenon.

So after noting these behaviors over the past few years, it feels safe to assume that the majority of humanity, left on its own, is simply incapable of having civilized and rational discussions on public forums. This is more so when it comes to emotionally motivated conversations in domains like Politics, Religion or Surveillance/Privacy. These topics are honeypots for biases and fallacies that define human reasoning. Reasoning that evolved to win arguments rather than finding the truth.

And none of this is surprising when you consider the strong association of these beliefs to a person’s identity. If my beliefs define me, clinging on to them is the best course of action for my survival. A behavior that is consistently observed in tribes or cults. The reason why emotionally driven comments or ideas are more likely to spread, is because the actual reasoning behind the ideas becomes irrelevant after a point and the strong personal association one feels with those ideas takes utmost importance.

Hence my interest in the commenting systems that simply by their mere usage, aid in motivating the users to disagree properly. Let’s look at some solutions that have been offered in the past.

The well known problem of ranking comments

The problem of ranking comments in a fashion that encourages healthy discussion is a fascinating one and companies involved in managing forums have tried different algorithms to make the system better. Back in 2009, Reddit introduced the “best” option for sorting that overcame the problem of comments that would show up at the top and consume upvotes only because they were posted immediately after the story was posted. As Randall Munroe explains:

reddit is heavily biased toward comments posted early. When a mediocre joke gets posted in the first hour a story is up, it will become the top comment if it’s even slightly funny. (I know this particularly well because I have dozens of reddit fake identities and with them have posted hundreds of mediocre jokes.) The reason for this bias is that once a comment gets a few early upvotes, it’s moved to the top. The higher something is listed, the more likely it is to be read (and voted on), and the more votes the comment gets. It’s a feedback loop that cements the comment’s position, and a comment posted an hour later has little chance of overtaking it – even if people reading it are upvoting it at a much higher rate.

This is supposed to help useful comments posted later (seems to be not working though) but it obviously doesn’t address the problem with quality of the comments which has been degrading consistently over the last few years on Reddit. This isn’t surprising since any growing community will generally bring more noise than signal especially on forums that discuss controversial topics. And its not just the quality that has degraded. There is also the issue of lack of diversity of opinions. Contrary opinions are not welcome once you have a majority that flourishes and congratulates itself by seeings things in black or white. So in the end, downvotes end up meaning disagreements even though there original intent was to weed out trolls. These downvotes tend to bury a contrarian viewpoint into oblivion even if it was articulated eloquently, which in turn converts a particular forum into an echo chamber. Many small communities within Reddit - aka subreddits - rarely attract a large audience and are still great places to hang out in but they provide no mechanism other than the “report” button or heavy moderation (which can go terribly wrong) to encourage rational discourse.

Coming back to the problem of ranking comments, a system that can recognise what echo chamber means and gives a particular thread of comments some value based on the amount of herd mentality or lack of contrary opinion will at least keep us aware of the level of bias that such a thread might be infected with (This is less applicable to threads discussing natural sciences and more to social and political sciences). At this point though, complex ranking algorithms only work towards keeping a good mixture of old, new and popular comments at the top depending on factors like “time since main post”, “user’s karma”, “upvotes/downvotes ratio” etc. Hacker News’s(HN) ranking algorithm is one of the most complex that I have seen. These strategies definitely help the readers to witness the best comments (decided by the majority) at the top but still doesn’t address the issue of encouraging civil discourse. Since the HN audience predominantly consists of people with better sense of civility, the comments there aren’t a good measure of the population of comments in general. But I should add that it has been noted that the growth of HN has led to some degradation in quality and there are reasonable complaints about HN being an echo chamber too. But at least its run by people who welcome contructive criticism and are always looking to improve the system.

Google’s failed attempt and arrival of Discourse

In 2013, Youtube tried to tackle spams and trolls and failed miserably. They infamously started forcing people to have a Google+ account to be able to comment and this was (at least superficially) argued as a measure to reduce the level of abuse or unnecessary vulgarity . But nothing changed for the better (in fact, the situation became worse) and they finally decided to let go of it.

Discourse is a promising tool that is in the business of promoting healthy discourse. Their Universal Rules of Civilized Discourse highlight the major sources of contamination in a healthy online communication. Here is what they aspire to achieve:

Our trust system means that the community builds a natural immune system to defend itself from trolls, bad actors, and spammers — and the most engaged community members can assist in the governance of their community. We put a trash can on every street corner with a simple, low-friction flagging system. Positive behaviors are encouraged through likes and badges. We gently, constantly educate members in a just-in-time manner on the universal rules of civilized discourse.

Although they do a decent job of encouraging healthy discussions and reminding users to not be a dick, there is nothing in the system itself that pushes the community to better understand the nature of a quality comment. A quality comment is a one that reflects curiosity, honest skepticism or just willingness to participate in a constructive way that adds to the knowledge base of the thread. Because of ambiguities associated with the meaning of these traits, its super hard to create such programs. Discourse is definitely a step in the right direction albeit a small one.

Proposed ideas

In my hunt for the propsed methods to improve the situation, I came across an idea on HN about a commenting system that consists of different “leagues”:

There needs to be a way where there is automatic segmenting based on quality of commentary. Maybe like a Major leagues, minor leagues and troll leagues, where there are two or three simultaneous threads going on. People who’ve never commented before and have no karma end up in the minor leagues by default. If they get upvoted enough their, their comments go to the major leagues. Likewise, if their comments get downvoted enough it ends up in the troll leagues, where the trolls can quibble among one another.

I like this idea but identifying trolls from serious arguments that are politically incorrect isn’t simple especially when you keep into account the high levels of sensitivity. In recent years, it has become terribly hard to not offend people and one needs to tread carefully when being brutally honest. If we allow people to use votes to make the decision of relegating someone to a league of trolls just because they happen to “offend” the majority, then its again a failed system that selectively chooses people who are like minded in their political ideologies (this is exactly what seems to be happening with college campuses in USA). And this will always remain a problem with a homogeneous community of people who get to decide who stays in their league. So there isn’t any completely objective basis on which to relegate(except in obvious cases where the commenter says nothing other than a derogatory or congratulatory statements which never contribute even a dime to the conversation).

Machine Learning to the rescue?

(Disclaimer: Heavy speculation follows due to ignorance of machine learning techniques)

That’s where I feel Machine Learning will have to kick in. If we can use a natural language learning algorithm that does a decent job of concluding which comments comprise of a healthy or rational ones and which ones qualify as fluff or troll, we will have some objective basis on which to relegate users to a troll league. I feel optimistic about this speculation because of the presence of websites like Lesswrong and Hackernews that can provide hundreds of thousands of datasets of comments that potentially qualify as healthy. The signal to noise ratio is relatively high for these websites which makes them a perfect dataset for understanding the meaning of a good comment. On the other hand, comments on Youtube and some horrible news website like Drudge Report and Salon have some of the most awful comments sections you will find on the interwebz. These can help in learning the nature of unhealthy comments. It will take a few years to pinpoint the exact characteristics of a comment that will allow the algorithm to tag it as “A”, “B” or “C” (those being the name of different leagues) but one (admittedly not so great) feature that immediately comes to mind is the length of the comment. As with everything, there are exceptions to this but in my experience longer comments tend to be relatively healthier. Machine learning algorithms could assign higher weights to a comment that crosses a particular word length. Of course Mark Twain would be horrified with such weight assignment. The truth is that great advances in “feature detection” of comments, that improve our discourse on the internet, cannot really happen without significant progress in algorithms that try to better understand the English language.

Another point of concern are the comments that regularly digress from the original topic of discussion. Sometimes they are useful and can provide insightful ideas but on most occasions they distract me as a reader who is only concerned with the primary discussion triggered by the original post. Could there be a way to sense digression and allow the reader to filter out the comments that initiate digression, along with their children (where “children” would be all the descendants of the comment that initiated the digression)? It is again tempting to think of Machine Learning (more specifically discourse analysis) as a solution to this problem. But with our current understanding, there is no question that high number of false positives and false negatives will be disastrous for the system that tries to understand and detect digression.

Finally, with more advances in natural language learning, we would need an algorithm that can go through the comment and validate its verity by looking for unstated assumptions or taken for granted facts without any good resources to back them up (exceptions would include statements like “Earth goes around Sun”, “Evolution by natural selection” etc.). So even a great comment that deserves to stay in A league, could have a tag attached to it that symbolizes something like “outstanding claim without evidence”. This will be very similar to Wikipedia’s strategy of checking citations or sources for something that author of the article claims as a fact. This final strategy could significantly increase the standard of quality and even make the commenter a better writer. One will get to filter even the comments within “A league” based on the tags attached to them.

Contemplation or hypothesizing should obviously not come under this scrutiny and one can simply have some options to chose the level of belief they attach to their comment. For example, If I make a hypothesis, I get to choose a tag called “speculation” that disallows the algorithm to run its bullshit detectors on my comment. Hopefully the community will be constantly helping the algorithm to learn because as I mentioned previously, there are bound to be many errors initially which would make some of these machine learning systems unpopular and ironically authoritarian. There is no improving these systems without some cooperation and patience of humans using them.

This brings me to the final idea that we need to look into for moving towards a better commenting system. That is, creation of writing tools that allow humans to form structured comments. I am imagining tools that will make it convenient for commenters to structure their comments in a more methodical way which then allows these tools to better perform language processing. Hypotheses, anecdotes, facts etc. are much more easier to deduce once the user can either annotate their comments or use specific keywords to make annotation easier for algorithms. We see this in programming where it is trivial for compilers to test a piece of code. Programmers are supposed to follow a set of instructions while writing and the syntactical constraints make it easier for the debugger to test this “prose”.

Since advances in machine learning are unpredictable, its hard to say how distant is the proper execution of the aforementioned ideas but there is no reason to doubt that future commenting systems will be monitored and managed by AI which will better filter bullshit and incentivize the members of the forum to engage in a rational civil discourse.