Excitement has been justifiably high the last few days because of a rash of Twitter bots that post a short note whenever an anonymous change is made to Wikipedia from an IP number belonging to a government. It started with @parliamentedits, then Ed Summers wrote
anon and used it to set up @congressedits (don’t miss his blog post about it, or the new Wikipedia entry). Then things exploded.
(This is a long post. Good background music while reading is Listen to Wikipedia.)
As always with Ed’s programs,
anon is easy to install and to use and comes under a free license, so if you’re a bit handy with the command line then you’re bound to be able to get it working without too much trouble. That helped the rapid flourishing of similar Twitter bots.
A quick note about how it works:
anon uses wikichanges to listen to IRC channels spewing notifications every time a Wikipedia is edited, which is a great way of getting this information out. If you go to #en.wikipedia you’ll see something like this:
Nick Ruest (we work together at York University Libraries) used anon to set up @gccaedits, which monitors anonymous edits from Government of Canada IP numbers. He got a huge amount of interest from Canadian media, which was great to see. Some links:
- A New Twitterbot Is Tracking the Canadian Government’s Wikipedia Edits (Vice, 14 July 2014)
- Political Staffers Tried to Delete the Senate Scandal (and Other Bad Behaviour) from Wikipedia (Vice, 15 July 2014)
- Twitter account tracks anonymous Wiki edits from House of Commons addresses (The Star) (16 July 2014)
- Dean Del Mastro wants probe into mocking Wikipedia updates on him made from House of Commons computers (National Post, 16 July 2014)
The Dean Del Mastro vandalism is unacceptable but also rather amusing, like the change of profession from “Auto Dealer” to “Dealer of Used Cars with Bent Frames, Perjurer.” (Del Mastro has more to worry about than Wikipedia edits, in any case. As the article says truthfully, “After being charged by Elections Canada with falsifying election documents and knowingly exceeding the Election spending limit, he resigned from the Conservative caucus. Del Mastro faces charges of violating the Canada Elections Act and up to five years in prison with a $5,000 fine.”) Browse the full editing history of the page yourself.
Why it’s anonymous and why it will stop
When you make a contribution to any Wikimedia Site, including on user or discussion pages, you are creating a permanent, public record of every piece of content added, removed, or altered by you. The page history will show when your contribution or deletion was made, as well as your username (if you are signed in) or your IP address (if you are not signed in). We may use your public contributions, either aggregated with the public contributions of others or individually, to create new features or data-related products for you or to learn more about how the Wikimedia Sites are used.
Unless this Policy says otherwise, you should assume that information that you actively contribute to the Wikimedia Sites, including personal information, is publicly visible and can be found by search engines. Like most things on the Internet, anything you share may be copied and redistributed throughout the Internet by other people. Please do not contribute any information that you are uncomfortable making permanently public, like revealing your real name or location in your contributions.
And from Account Information and Registration:
[I]f you contribute without signing in, your contribution will be publicly attributed to the IP address associated with your device.
But if you do create an account and sign in, your account name is available but your IP number isn’t. Wikipedia keeps it in private for a little while, in case of abuse, then throws it out. You can edit pages pseudonymously and unidentifiably if you create an account and take a bit of care. That’s good.
All this explained to me why it’s only anonymous edits that are being reported: there is no way to know who else on Parliament Hill is editing Wikipedia. (Well, no easy and obvious way for regular people—setting aside CSEC and government IT, who can track anything they want.)
That’s why I bet all the anonymous Wikipedia editing from government offices is going to dry up. Word will get around, staffers will be admonished, politicians will huff and puff … and then anyone wanting to do anything sneaky will use a sock puppet or tether their phones or go to a café. Unless they identify themselves or someone does detective work, we’ll have no idea who they really are.
Another approach to all of this is to begin not with an incomplete set of editors but with a complete set of pages and tweet every time any one of them is edited.
anon does this now too, and @congresseditors is one result. Nice work from Ed and all the others who helped!
Whatever you post on Wikimedia Sites can be seen and used by everyone.
Good advice about everything online, in fact.
Historical anonymous edits
I like Twitter bots as ambient ways of keeping up with what’s happening, but this deluge of information blasting out was all going too fast for me. I move more slowly. Besides, anon is written in CoffeeScript and node.js, neither of which I’m good at. I began to wonder: what about past edits? What did they show? Aha! This was something I could tackle with Ruby and then make charts in R, which is more my speed. Here’s what I did.
Information about edits made by accounts is available in a nice human-readable way. For example, Special:Contributions/188.8.131.52 shows the most recent changes made by someone at 184.108.40.206, the IP number that made that Dean Del Mastro edit. (Notice the information box at the bottom that points out this is an IP user. “Registering also hides your IP address,” it reminds. The whois link tells you more about who owns this IP number: “CDAGOVN - Government Telecommunications and Informatics Services, CA.”)
But for this I used the MediaWiki API, in particular Usercontribs: “Gets a list of contributions made by a given user, ordered by modification time.” The API lets you get that Special:Contributions information and a lot more through variables specified in a URI. For example:
[Get title, timestamp, sizediff for 50 most recent changes on en.wikipedia.org made by 220.127.116.11](https://en.wikipedia.org/w/api.php?action=query&list=usercontribs&ucuser=18.104.22.168&uclimit=50&ucprop=title timestamp sizediff&format=json) (JSON)
[Get title, timestamp, sizediff for 50 most recent changes on en.wikipedia.org made by 22.214.171.124](https://en.wikipedia.org/w/api.php?action=query&list=usercontribs&ucuser=126.96.36.199&uclimit=50&ucprop=title timestamp sizediff&format=xml) (XML)
The XML version will probably be more readable in a browser.
curl "https://en.wikipedia.org/w/api.php?action=query&list=usercontribs&ucuser=188.8.131.52&uclimit=50&ucprop=title|timestamp|sizediff&format=json" | jsonlint | more
Perfect for reading into a program and munging.
That was just for one IP number. What about for a range? For that, I wrote
contributions-by-ip.rb. Given ranges of IP numbers, it runs through each one and queries 37 different Wikipedias (different languages) to find any changes, and then it dumps the results to a comma-separated value file.
I ran it on the House of Commons ranges that Nick has listed:
["184.108.40.206", "220.127.116.11"] (and everything in between). While running it looks like this:
Here’s some of the output:
user,lang,title,timestamp,pageid,revid,parentid,sizediff 18.104.22.168,en,Noam Chomsky,2005-01-24T22:20:01Z,21566,9653831,9624290,9 22.214.171.124,en,Willie Adams,2005-02-04T20:19:34Z,705884,13096944,9946154,1 126.96.36.199,en,Don Boudria,2005-04-12T15:39:03Z,479215,12458768,12210018,320 188.8.131.52,en,Helena Guergis,2005-04-29T12:01:39Z,1415882,12987067,12971149,-237 184.108.40.206,en,Gerry Ritz,2005-05-02T21:50:03Z,1831626,13179278,0,130 220.127.116.11,en,Gerry Ritz,2005-05-03T15:07:31Z,1831626,13197551,13179278,194 18.104.22.168,en,Jeremy Harrison,2005-05-03T15:24:57Z,1834907,13197734,0,334 22.214.171.124,en,Gatineau (electoral district),2005-05-16T18:41:04Z,1745109,17398215,13794806,0
The full file is on GitHub for now.
I like CSV files. They just sit there, solid, unchanging, comfortable, approachable, usable in any tool or language, unencumbered by licenses. CSV is the old sofa of data formats.
Examining with R
Let’s load in some of the usual handy libraries, read the CSV, format a couple of date things to make things easier later, and then have a beginning look at the data. There are 4485 edits recorded, we can see. From how many IPs?
Four IPs! What!? Only four IPs at the House of Commons doing the editing? No others? It’s surprising, but reasonable, especially given that we don’t know how the government or House IT department runs things. Surely the House has more than 255 IP numbers for internal use. These four could be the public-facing gateways. (Incidentally, 126.96.36.199 is parl153.parl.gc.ca; the name for parl203 exists, but there’s no parl155 or parl205.)
Let’s see which pages were the most edited.
List of Star Trek novels!? This page has been edited 101 times by one or more anonymous users at the House of Commons? It’s been edited more than any other page? Let’s look closer. When were these edits made?
92 edits on 23 and 24 December 2009. Christmas Eve. Imagine it.
Some poor schnook has to go into work, but there’s nothing going on. The House isn’t sitting. The MPs have all gone back to their ridings. Ottawa is filled with Christmas lights and people shopping and planning for the holidays. Every senior civil servant is at home. But meanwhile our Star Trek-loving staffer has to go in and sit around the office, maybe catching up on complaints from constituents or editing some committee submission. It’s boring. The day is dragging. It’s snowy outside. “The hell with this,” our staffer says. “I’m going to fix up that list of Star Trek novels on Wikipedia.” Our staffer digs into the work for an hour or two, enjoys it, and that evening checks over copies of books at home to make more improvements the next day.
I have no problem with this. This is perfectly acceptable. We’ve all done it. As Ed said in his post:
I wrote this post to make it clear that my hope for @congressedits wasn’t to expose inanity, or belittle our elected officials. The truth is, @congressedits has only announced a handful of edits, and some of them are pretty banal. But can’t a staffer or politician make a grammatical change, or update an article about a movie? Is it really news that they are human, just like the rest of us?
I am going to delete these edits from the data set so they don’t obscure the rest of the analysis.
Edits by language
Let’s see which languages are being edited. Turns out it’s almost all English, some French, then a little bit of German, Spanish, Italian, Portuguese, Ukrainian and Slovak. We’ll make a list of all the non-English/French pages that were edited.
Interesting mix there. Brand von Washington (de) sounds like a Pynchon character, but it’s the Burning of Washington in 1814, a popular feature of Canadian history. Steven Blaney is a Conservative MP; Isabelle Morin is an NDP MP. Sangria is a delicious wine-based drink. The one edit on Kanada at sk.wikipedia.org is a correction to say we’re a constitutional monarchy and that the United States and France (via Saint Pierre and Miquelon) are our neighbours. That’s a good update.
What are the most edited pages on the French language Wikipedia?
Jamie Nicholls, Isabelle Morin and Laurin Liu are all NDP MPs. Maria Mourani was expelled last year from the Bloc Québécois and now sits as an independent. Céline Hervieux-Payette is a Liberal senator. They are all from Quebec.
Concours Eurovision de la chanson 2009 is the 2009 Eurovision Song Contest. Looks like there was one fan who was making minor edits over nine months. That’s just a Star Trek novels thing.
Enough of that, let’s make a chart.
Charts of all edits
Now let’s break it down by IP number:
Hmm. Why are the edit counts so different for the four IP addresses? How are they allocated? I would like to know.
Edits since the 2011 election
Marking elections made me wonder how the editing has been since the last one, when the Conservatives won a majority. It turns out Jamie Nicholls has the most edited page(s), so let’s find out exactly where and when those edits were made.
|Looks like 13 September 2013 was a busy day on Jamie Nicholls (fr). We can use the revisions API to get [a list of all of the changes that day, with username, timestamp and comment](https://fr.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Jamie%20Nicholls&rvstart=2013-09-12T00:00:00Z&rvend=2013-09-14T00:00:00Z&rvdir=newer&rvlimit=20&rvprop=timestamp||user||comment). There was an edit war going on, I think with 188.8.131.52 defending Nicholls: one comment says, “L’auteur ajoute de l’information trompeuse à propos de Jamie Nicholls en utilisant des références qui ne soutiennent pas le caractère critique de ses propos,” which Google translates as “The author adds misleading information about Jamie Nicholls using references that do not support the critical nature of his remarks.” There is some back and forth.|
|The [listing of edits that day](https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Douglas%20Kinsella&rvstart=2013-05-24T00:00:00Z&rvend=2014-05-25T00:00:00Z&rvdir=newer&rvlimit=20&rvprop=timestamp||user||comment) shows an edit war between 184.108.40.206 and Glaisher. It starts with 220.127.116.11 vandalizing the page, then it goes back and forth with the page being repaired over and over until it is marked protected to prevent more vandalism from the House of Commons IP.|
Every Wikipedia user also has a Talk page, where other people can talk to the user. User_talk:18.104.22.168 has a May 2013 section with a complaint, a warning and then a message saying the account was blocked from editing for 48 hours. There’s another complaint in June from another user: “Please note that you are not permitted to overwrite Wikipedia’s articles about Members of Parliament with their own self-penned and unreferenced biographies.”
Here are the talk pages for all four IPs. They’re all marked as being shared IPs, and they’ve all been blocked for vandalism or come very close.
- User_talk:22.214.171.124. May 2013: “You have been blocked from editing for a period of 48 hours for your disruption caused by edit warring and violation of the three-revert rule at Douglas Kinsella.”
- User_talk:126.96.36.199. May 2009: “This is the only warning you will receive for your disruptive edits.”
- User_talk:188.8.131.52. July 2014: “You may be blocked from editing without further warning the next time you vandalize Wikipedia, as you did at Small Dead Animals.”
- User_talk:184.108.40.206. May 2011: “Please stop adding inappropriate external links to Wikipedia, as you did to Ontario. It is considered spamming and Wikipedia is not a vehicle for advertising or promotion.”
Back to the recent edits. The Jamie Nicholls edits were defending against an anti-Nicholls user, the Kinsella changes were pure vandalism. What about the next two most edited pages?
The history for Isabelle Morin (fr) for 5 March 2014 shows edits being reverted for being “trop promotionnel et modifié par le bureau même de la députée.” The history for Chris Warkentin shows the same: one edit by 220.127.116.11 was reverted with the comment “Reverting back to remove another copy and paste bio from a HOC [House of Commons] IP. Copy and pastng online bios is not how Wikipedia articles are written.”
I’m not going to dig into all this any more here, especially since we’ve already seen the Dean Del Mastro vandalism. There’s much more analysis that could be done, but it seems to be the case that recent edits are generally not helpful.
On the other hand, there really are very few of them.
That is not many edits at all. Given everything happening a) at the House of Commons and b) at Wikipedia, it’s minuscule. The vandalism is unacceptable but given what the Conservative government is doing to the country it’s trivial.
Stop being idiots
I have nothing at all against anonymous edits on Wikipedia. I make them myself. And I don’t know how government IT runs its networks in the House of Commons, where those IPs are allocated or how they’re shared or why only those four IPs are seen on Wikipedia.
But to those people in the House doing the vandalizing: stop being idiots. Set up a real account and be more constructive. You know you can’t get away with it any more. Be helpful, whether that means editing pages about Canadian subjects (in English, French or other languages) or Star Trek novels or something else.
Wikipedia’s data access is wonderful
I know there’s a lot of academic research on Wikipedia but I’ve hardly looked at any. I will now.
I mentioned the Special:Contributions pages before. You can use them to check on what each of the four IPs is doing:
- Special:Contributions/18.104.22.168 (feed)
- Special:Contributions/22.214.171.124 (feed)
- Special:Contributions/126.96.36.199 (feed)
- Special:Contributions/188.8.131.52 (feed)
Jari Bakken (@jarib) did something similar to this for Norway: Anonymous Wikipedia edits from the Norwegian parliament and government offices. Looks like they’re more active over there than here.
He did this using Wikipedia data dumps.
I just saw that he also pulled Anonymous Wikipedia edits from the Government of Canada using Google BigQuery’s Wikipedia Revision History (which goes from 2002–2010). Nice! See Anonymous Wikipedia edits from around the world for more.
Through ARIN I found an IP range apparently belonging to the Communications Security Establishment Canada, Canada’s signals intelligence spy agency, our equivalent of the NSA or GCHQ: 184.108.40.206 - 220.127.116.11. I checked those IPs and only found two anonymous edits:
- Add Tina Lamontagne to ‘List of philanthropists’ (reverted 34 minutes later)
- Change ‘verify’ to ‘certify’ in Certificate authority