Do scientists need a fanbase? The #SciFund data says YES!

Do scientists need a fanbase? The #SciFund data says YES!

Note: You have only a scant few days to sign up for #SciFund Round 2!!! Do it now if you’re interested!

So, the burning 5 bazillion dollar question on the mind of me and Jai since the Inception of #SciFund (a process that did indeed look just like this) has been the following:

A vision of the future of science communication

Does having a following as a scientist matter?

or, as Jai likes to put it

Do scientists need a fanbase?

So far, I’ve nibbled at the edges of this – showing that, yes, getting a lot of folk to view your project matters. Having a blog also indeed matters. But we want a better way to drill down into this phenomena – is the emerging paradigm of “Be visible or vanish!” at play in #SciFund? Do we find evidence of it here?

Fortunately, we’ve got a number of metrics as grist for this mill – number of twitter followers of participants, number of project tweets, Facebook friends, Facebook likes, and blog presence. And we know that ultimately they need to be impacting page-views to have an impact on total raised.

So…what matters? I’ll admit, this is an exploratory analysis, so the first thing I did was a lot of all possible relationships. Just to see. Don’t hate me for not being 100% a priori. There’s a lot there, but the clearest thing to me is that…this should all work. Somehow. Also, intriguingly, within network measures (i.e., Facebook likes and Facebook Friends) are indeed correlated. But between network measures appear to not so much be related to one another.

For example, let’s look at number of Facebook friends and number of Twitter followers. A quick correlation test shows no relationship (r=-0.146, t=-0.795 df=29, p=0.433). So, the two networks measure different things.

Which Measure of Audience to Use?

Within networks, we have two different measures. For Facebook, we have Friends and “Likes”. For Twitter we have Followers and Tweets. For #SciFund we want to measure the ability of scientists to cultivate an active community of people interested in their work.

For Facebook, this seems simple. Sure, you have your network. But, particularly given Facebook’s weird filtering schemes, just the size of your network is not going to measure the number of people paying close attention to you. Particularly as people can like you if they are out of your network. Having the number of likes, on the other hand, gives an indication of the number of people whose interest a scientist has captured enough to get them to pay attention to their proposal. So, I’d argue that Facebook Likes is a better metric of the ability of a scientist to cultivate interest in their work.

For Twitter, determining the right metric seemed mode complicated. Number of tweets about a project is the obvious choice, but, tweets could come solely from a single person tweeting out their own project link, or it could come from retweets. Many of us tweeted every time we got a donor, for example. There’s no way to separate our own signal from that of our network really. I’d argue that it’s probably not the world’s greatest proxy. OK, so, number of twitter followers. Does number of followers reflect active interest? I’d argue yes. While there are any number of groups that will follow you to try and get you to follow them back, I’d argue that most follows express some degree of genuine interest. Indeed, following on Twitter is a far more dynamic thing than Facebook – a process that may well be reflected by the lack of correlation between twitter following and Facebook network size.

OK, so, we have out metrics. Let’s model this sucker.

Model, Model, It’s Big, It’s Heavy, It’s Good!

So, we have pre-goal and post-goal pageviews as our metric of import. Starting with pre-goal pageviews, I constructed a model with Twitter Followers, Facebook Likes, and Whether Someone has a blog as predictors of pre-goal pageviews. Nice, simple, additive model. As we’re going with count of pageviews, again, I used a quaisipoisson error distribution and a linear identity link function. </statsnert>. What it shows is surprising. Here’s the likelihood ratio rest table:

Analysis of Deviance Table (Type II tests)

Response: preGoalPageviews
                                     LR Chisq Df Pr(>Chisq)
TwitterFollowers                       14.116  1  0.0001719 ***
Facebook.likes                         42.693  1  6.404e-11 ***    0.959  1  0.3274524

No blog effect retained! What? Interesting. But how good is the fit of this model? Well, the R2 of the observed to predicted is 0.75. Wow! Paydirt.

I had run this using the earlier outlier excluded dataset. But, out of curiosity, I added it back in, just to see if these new predictors might better explain the one project with the ridonculously high number of pageviews. Pleasingly, the LR Table hardly changed at all, and the R2 actually *increased* to 0.78. Here’s one way to look at it

Here point size is proportional to number of twitter followers. So, you can see that, more Facebook likes, more pageviews. What’s interesting is that if you look at two of the biggest deviations from that trendline on the left hand side of the graph, they have very large numbers of Twitter followers. So, you can see the Twitter effect in those residual values. Very cool. And what of the coefficients? Well…

                                     Estimate Std. Error t value Pr(>|t|)
(Intercept)                           9.1007   260.1092   0.035   0.9723
TwitterFollowers                      1.2060     0.5747   2.098   0.0444 *
Facebook.likes                       19.9301     3.4107   5.843 2.16e-06 ***  83.7213   350.6876   0.239   0.8129

1 Twitter follower = 1 page view. 1 Facebook Like = 20 Page views. This is not to say that Likes have more weight than followers. Likes are an indirect measure of how you’ve been able to engage folk via Facebook. There’s something latent going on there (must…resist…urge…to…use…latent…variables!) so this is an indicator of engaging your Facebook network rather than a 1:1 matching. Interesting nonetheless. Also you can see that non-significant blog coefficient. Sad.

So, what about post-goal pageviews? We have a smaller dataset here, and the glm with a quasipoisson error distribution choked on it, so, I’m going to relax, have a homebrew, and fit it with a Gaussian distribution. Don’t hate me. I’m also going to include the outlier point because, again, models including it appear to have a better fit again (more on that in a second). So, what do we see?

Analysis of Deviance Table (Type II tests)

Response: postGoalPageviews
                                     LR Chisq Df Pr(>Chisq)
TwitterFollowers                        0.489  1     0.4842
Facebook.likes                         84.628  1     <2e-16 ***    0.903  1     0.3419

Still no blog effect. AND no Twitter effect on post-goal pageviews. But likes are still in play. So why?

OK, I have a theory. An untestable theory (sorry!). Remember when I said that Facebook likes are an indicator? Once a project goes ‘viral’ as it were, or really blows up, more people than just the investigators immediate network on Facebook may be ‘like’-ing it. We’ve become well trained to click the button. So if more and more folk are looking at an awesome successful project, hitting “Like”, that’s showing the project to THEIR social network, which may be totally unrelated to the initial investigator itself.

I know, I know, this is arm waving, but, so be it. Post-success hits are a measure of a project really going large. And it’s worth thinking about just how we measure and look at that. Granted, I’d feel better about this is we had pre and post likes, but, oh well.

Here’s the kicker, though. What’s the R2 of the relationship between observed and predicted? 0.95. Suck it, doubt-monkeys!

And FYI, Facebook Likes have roughly the same impact – ~ 20 page views per like (21.2 +/- 2.3052 SE).

Whither the Blog Impact?

The seemingly mysterious thing in all of this is that blog impact seems to have disappeared. Having a blog doesn’t directly lead to page views.

In my brain, this means that the effect of blogs must still be there, but mediated via Twitter and Facebook. This actually makes sense – having a blog and trying to engage your audience will lead to a larger Twitter following. As a promotion tool, it may also lead to more Facebook likes for your project. Or at least a following that is more active and engaged in what you are doing.

So I ran a simple model with a normal error to look at Facebook Likes or Twitter followers as a function of Having a Blog, Posting Frequency, and Age (I threw in Age because, well, why not!). I also included Facebook friends as a covariate for the Facebook analysis.

Having a blog was actually important for Facebook Likes (LR Chisq = 6.413, p=0.0113), but nothing else was. Having a blog increased your number of Facebook Likes by around 109. Which, given the above analysis, translates to ~ 2180 more pageviews both before and after hitting your goal. Quite important! While the model wasn’t a great fit (R2=0.22), it’s on the order of what we were looking at for just raw pageviews. So clearly there’s more here – something we haven’t yet folded into the model.

Posting frequency was the only thing that fell out for the Twitter Follower model (LR Chisq=5.352, p=0.02). For every additional post per month on your blog, it appeared you were likely to grab about 34 twitter followers. Not bad. I need to post more on my blog! Sheesh! This model was a better fit with an R2 of 0.37.

Additionally, I’d like to point out that there was an indirect effect of having a blog. In a related analysis, I found that most bloggers posted around 4 times a month, but, moreover, as one got older, they were more likely to post more frequently. Indeed, about 1/2 a post per month more for every year beyond 25. When I’m 90, I’m going to be a blogging machine.

The BIG Conclusion to Tie the Room Together

All of this is a long way of saying, we have this really interesting chain of indirect effect here. From blogging to having an online following to that following looking at your stuff, to funding! If we fill in the parameters from this analysis and string the whole thing together into a path diagram, it looks like this:

Final path model. Note, this is not from a single omnibus fit, as different pieces were fit to different subsets of the data. Individual coefficients represent number of units of change in the downstream variable per change in 1 unit of the upstream variable.

Each of those coefficients describe the relationship between one variable and the next – so, units are change in y per change in 1 unit of x. This isn’t a formal SEM (different parts are fit using different subsets of the data), but it’s pretty darned provocative showing that social engagement ultimately leads to funded research.

So, do scientists need a fanbase? If they’re going to fund their work online, yes! Making that vital connection to the world around you can lead to more successful research, and a vibrant informed community of science-interested folk. So sayeth #SciFund!