Reverse Engineering the Medium Earnings Model for Authors - Part 1

Reverse Engineering the Medium Earnings Model for Authors

Part 1 — Using data to analyze the Medium earnings model

Let's Begin. My original title for this story was "How to (Not) Make Money Writing on Medium", since as they say, you should write what you know. I'm six weeks into this endeavor with six stories and a whopping $3 of earnings. I am thus a certified expert on not making money on Medium.

I want to understand why this is so difficult. Perhaps I can be smarter.

My findings, shared here, are based on data I gathered from the profiles of other Medium authors as well as my own meager Earnings page. Read on, and I'll give you some data-based recommendations to help you increase your own earnings. Not hearsay. Not conjecture. Just the facts, ma'am.

In Part 1 of this multi-part story we'll just be getting started. There's a lot of information to be mined and analyzed, so we'll need additional stories to do all of this data justice. See the link at the bottom of the story for Part 2.

What prompted this analysis? I have a mere 19 followers to date, and 3 of those follow by email. Unsurprisingly, most of my followers are people I know. I also noticed that my email followers are other writers with interests far outside of my own. Why are they following me?

Further examination shows they each follow a huge number of authors besides me, in one case over 50,000. So I can safely assume they are not following me because of my exceptional writing skills and riveting stories. There must be a reason why they click away and risk carpal tunnel to follow so many other authors.

A little searching uncovers a strategy called "Reciprocity" — that is, I'll follow you if you follow me, and if you don't, I'll unfollow you. Why are followers important? Let's deep dive into the Medium earnings model and see if we can figure it out. Let's also try to figure out if following others will help increase our own number of followers.

The Author Profile Data Set

Let's begin our analysis by examining the Medium data that is publicly available, which might help us understand the earnings model. Medium doesn't offer much in the way of a Developer API, so we'll need to gather our data the old fashioned way. For this study, I grabbed data from about 500 author profiles, encompassing about 50,000 stories. I'll spare you the details, but let's just say it helps to be a programmer.

Each Medium author profile page offers the following public information:

Number of followers
Number of authors following
Date joined
List of stories published

And for each story:

Date published
Number of claps
Number of comments
Expected read time (based on word count)
Publication, if applicable
Number of stories (not stated explicitly, but we can count them)

Capturing the number of highlights, and the text highlighted, is also possible and interesting, but it's enough extra work that we'll push that off for another day.

500 profiles provides enough data to draw some interesting conclusions, but it's by no means representative of all of Medium. I gathered the data by starting with a successful author (meaning, a relatively high number of followers), and traversed a few of their following links to find other authors, traversed a few of their following links, and so on. Thus, this data likely skews towards more successful authors since it's likely that successful authors will tend to follow other successful authors.

The Medium Earnings Model

If we want to improve our earnings, we'll need to better understand the Medium earnings model. So let's start our analysis there and use what we learn about the earnings model to drive our forthcoming investigations.

I know a lot of readers are probably math-averse but stay with me, we'll keep it high level.

Medium provides a pretty good overview of their earnings model, but details are sketchy.

Here's their model in general terms:

Revenue = (Engagement + Follower Bonus + Boost Bonus) * (Read Thru Adjustment)

To maximize earnings, we clearly need to maximize all the terms on the right side of the "=" sign. Let's simplify things a bit and remove the Boost Bonus, since that's not going to happen very often, at least for most of us. Deep diving a bit more, we have:

Engagement = f(Read Time, #Claps, #Highlights, #Responses, #First Time Followers)
Follower Bonus = Multiplier for Readers that are Followers
Read Ratio = % Users Who Read for 30 Seconds or More

When you see "f()" it means "a function of", which is a fancy way of saying we know what the variables are that impact our earnings, but we don't know anything about how they are interrelated or how they each contribute to revenue. Yet.

Notice that "Followers" appears twice, first in the engagement numbers and again as a "Follower Bonus". And this is our first clue — we definitely want to maximize our Followers. How can we do that and can we quantify the value of a follower?

Part 1: How to Increase Followers

Let's see if we can use the data to discover how we might increase our followers, because according to the earnings model, more followers means more earnings. Along the way lets investigate the key concepts of correlation and causation.

Correlation and causation are closely related and frequently misunderstood. Causation is what we're after — we want to know what actions we can take that will actually move the needle and improve our earnings. Correlation means a variable is just along for the ride, and pushing on it won't cause any material changes to the thing we're interested in, which in this case is followers.

Unfortunately, distinguishing between correlation and causation is something of an art. We have some data mining tools at our disposal that will help us, but we also need to use our intuition about how things really work to guide our analysis. We know for sure that more followers will help us — because Medium tells us as much — but how do we gain followers?

Let's use the data to investigate.

Why do some authors have lots of followers?

The median number of followers for each author in our data sample is 840, and the mean is 4,053. Why the big disparity? Median means middle — the value in which half of the samples are above the value and half are below. Mean means average — the total followers across all authors divided by the number of authors. When there is a big difference between the median and the mean, we know that the data is skewed.

Let's look at a histogram for followers. A histogram counts the number of samples between values, where each value range is sometimes called a "bin". We see that virtually all of our authors have fewer than 10,151 followers (the first bin). But one author has 203,000 followers! Lucky him.

Histogram of Medium Author Followers

And again, the middle value is 840. So if you have more than 840 followers you are doing great.

So we know that there are a small number of authors with a huge number of followers. How many authors have more than 10,000 followers? The data tells us it's about 6%.

Medium lore says that about 8% of writers make more than $100 per month. Is it true? Who knows. Medium doesn't share that information publicly. But, given our 6% number above, I'll bet that it is.

How can you join the rarified ranks of those earning more than $100 per month? For starters, let's have a look at the profiles of those authors with lots of followers. What do we find?

Virtually all have written books or are published on mainstream media
They write about general interest topics like self help, starting your own business, making money on Medium, getting rich quick, productivity, and religion
The median number of other authors they follow is 1,225 and the average is 10,265
They publish, on average, 5 stories per week
They have a significant social media presence

Clearly, these authors are in a class of their own — and they skew the data for the rest of us.

Does it help to write a lot of Stories?

Answer: Yes! Which is both not surprising and somehow comforting. We've got to stick with it.

The figure below shows the number of followers versus number of stories written, along with a trend line.

Followers vs Stories

Those pesky successful Medium writers skew the data, but we can filter them out and see a more pronounced trend line more appropriate for us mere mortals.

Followers vs Stories (Filtered)

The slope of the trend line tells us that for every story we write we can expect, on average, roughly 10 new followers. That hasn't proven true for me, but I also don't write about broadly popular topics. In fact, if you've stay with me this far, then here's to you. Plus, there is a lot of scatter in the data, which means there are other variables in play besides number of stories.

Does it help to follow other authors?

I was inspired to write this series when I discovered the notion of follower reciprocity. Does it work? First, let's see if there is a trend line for followers versus following.

Followers vs Following

Indeed, it seems following more authors does in fact suggest you will have more followers. How about if we remove the high-follower authors?

Followers vs Following (Filtered)

Interesting. And for me, a little unsettling. Does this mean that Medium is a den of iniquity, where authors follow one another not because they enjoy their content, but because they are looking for quid pro quo? And does it even matter? Because no one can possibly read all the stories from the 10,000+ authors they are following, and no-read means no-cheddar.

Which is more important, writing stories or following other authors?

Answer: Writing stories.

We can answer this question by creating a linear regression that will predict the impact of both number of stories and number of other authors followed on our own number of followers.

A linear regression, simply put, fits a line to the data, which in our case is this:

followers = k + a * stories + b * following

If the number of stories and the number of following are independent, then we can treat them as causal variables and the coefficients "a" and "b" will tell us their relative impact on the number of followers. "k" is a y-intercept value we can mostly ignore for our case.

I fit the above function to our data using 'R', an open source statistical analysis package popular with data scientists. When checking the "variance inflation factor", we see that the "stories" and "following" variables are relatively independent, meaning they both should have some predictive power. That's good news.

When we fit the above function to the data, we get:

followers = 497 + 9.0 * stories + 0.14 * following

Interesting! Of course, we don't get 497 followers for free (the y-intercept). That is an artifact of fitting a linear function to data that is not perfectly linear.

The relative "value" of a story is "9.0", meaning each story should yield about 9 followers, which is pretty close to what we found earlier. Likewise, for each author we follow, we can expect about 0.14 followers in return. That means roughly 10% of the authors we follow will reciprocate and follow us. Realistically, this is an educated guess because our data also includes "genuine" followers who are not following just because we followed them.

So clearly, writing stories has a lot more bang for the buck, but on the other hand writing stories requires a lot of bucks, so to speak. Clicking on other authors to follow them is a lot easier, but you'll need to be doing a lot of clicking. Plus, it seems a bit unsavory, or perhaps that's just me.

Here's an interesting tidbit in the data - I found one author following nearly 100,000 other authors. And, given my relatively small data set, I bet there are plenty of others with even higher numbers. For reference, by my estimate if you click to follow other authors eight hours a day with one click every two seconds, you'd need well over a year to generate 100,000 clicks.