O.R. and social networking: A Solver Foundation MIP model to suggest Twitter followers

Please note that Solver Foundation is no longer supported.

For this month’s INFORMS blog challenge I thought it would be fun to write a simple Solver Foundation – Twitter mashup. I’m not as happy with it as I could be, but maybe someone will find it interesting.

My goal is to write a web application that will suggest interesting twitter users to follow. Here’s what the finished product looks like:

TwitterSuggest

As you can see, the application asks for:

  • a keyword,
  • a maximum number of twitter users to suggest,
  • minimum and maximum number of expected tweets per day.

and produces a list of twitter users to follow. I will describe some of the key steps in building this application, but I won’t provide all of the source simply because it would be a lot of work for me to clean up the code! I will try to provide enough so that the motivated reader can finish it off. If you want to follow along, get Solver Foundation, open Visual Studio 2010, and create a new web site (say MsfTwitter). Drag the controls shown (several Labels, one CheckBox, three TextBoxes and a Button) on the screenshot onto your main form, then double click on the button (which I named "Suggest"). Then you’ll be sitting in the codebehind file for the click button handler:

  protected void Suggest_Click(object sender, EventArgs e) {
  }

The end goal here is to fill this with some code that will find suggestions. In order to do this, we’re going to have to know something about twitter users. For now, let’s assume that we’re able to obtain the number of tweets per day and an "interest score" that indicates how interesting a twitter user is for a particular topic. Then we can define a User data structure which is the basis for our model. Add a new C# file into the App_Code directory and add this:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Web;

[DebuggerDisplay("{Name}, score = {InterestScore}, tweets = {Tweets}")]
public class User {
  public string Name { get; set; }
  public double InterestScore { get; set; }
  public double TweetsPerDay { get; set; }
}

Our application logic is really divided into two parts:

  1. Finding interesting twitter users for a given topic. This gives us a list of Twitter users, including their interest score and tweets per day.

    2. Selecting a "best" subset of these users given the preferences specified on the web page.

These are actually both great analytics problems. I am going to rely on third party services to solve the first problem, and I will use Solver Foundation for the second (since that is an optimization model). So for now, let’s assume that we’ve got a list of twitter users and we want to solve problem 2. (If you’re following along in Visual Studio, now is a good time to add a reference to Solver Foundation using “Add Reference”.)

Following my standard pattern of identifying inputs, outputs, goals, and constraints, our model is as follows:

  • The input is a bunch of users. Each one has a score and expected number of tweets per day.
  • The output is a boolean value for each user: to follow or not to follow?
  • The goal is to maximize the total interest score over all selected users.
  • One constraint is that the expected number of tweets is between user specified values min_tweets and max_tweets.
  • Another constraint is that the number of users that we select is at most K.

    The next step is to write a class that we can use from the web page. I will call the class SuggestModel and give it one method. The signature is pretty straightforward based on my description above.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using Microsoft.SolverFoundation.Common;
using Microsoft.SolverFoundation.Services;

public class SuggestModel
{
  public static IEnumerable<User> Solve(IEnumerable<User> data, 
int minSuggestions, int maxSuggestions, int maxTweets) { } }

Creating the C# code for Solve is also straightforward given the description of the model. The Parameters for the model are the interest scores and tweets per day. Both are indexed over the set of users. We use the SetBinding method to associate the appropriate fields of the User objects with the parameters.

public static IEnumerable<User> Solve(IEnumerable<User> data, int minSuggestions, 
  int maxSuggestions, int maxTweets) {
  SolverContext context = SolverContext.GetContext();
  Model model = context.CreateModel();
    
  Set users = new Set(Domain.Any, "users");
  Parameter score = new Parameter(Domain.Real, "score", users);
  score.SetBinding(data, "InterestScore", "Name"); // bind to the InterestScore field.
  Parameter tweets = new Parameter(Domain.Real, "tweets", users);
  tweets.SetBinding(data, "TweetsPerDay", "Name"); // bind to the TweetsPerDay field.
  model.AddParameters(score, tweets);
  Decision choose = new Decision(Domain.IntegerRange(0, 1), "choose", users);
  model.AddDecision(choose);
  model.AddGoal("goal", GoalKind.Maximize, Model.Sum(
    Model.ForEach(users, user => choose[user] * score[user])));
  model.AddConstraint("c_choose", 
    minSuggestions <= Model.Sum(Model.ForEach(users, user => choose[user])) <= maxSuggestions);
  model.AddConstraint("c_tweet", 
    Model.Sum(Model.ForEach(users, user => choose[user] * tweets [user])) <= maxTweets);
  context.Solve();
  return data.Where(user => choose.GetDouble(user.Name) > 0.01);
}

Notice the last line: using a LINQ expression we return a list of users where the value of the choose decision is nonzero. That’s what we wanted. The decision values are integer valued and all the constraints and goals are linear, so this is a mixed integer programming model. It’s essentially a knapsack problem – no big deal.

Now we can return to "problem 1": finding interesting twitter users for the topic entered into the text box. To solve this problem I will rely on the technique (first suggested by Polya?) of being lazy – I will use other people’s code.

Twitter provides a simple REST API that is the equivalent of the "Who to Follow" button on twitter.com. You can learn more about it (and try it out) here.

To actually call this “who to follow” twitter API from our program, we need three more things:

  1. A twitter REST API key (so we can make calls to twitter). So I registered my app here: https://dev.twitter.com/apps/new. You can too – it’s free.
  2. Code to make authenticated REST calls. The authentication system is called OAuth. Shannon Whitley wrote a nice OAuth package for .Net and posted it on his blog here. Go grab that and put it in your App_Code directory. I changed the namespace to “OAuth”.
  3. Code to convert the API results (returned in a format called JSON). There is a Json.Net ackage written by James Newton-King that makes this easy. Follow this link for the details. In the code below where I work with JObject/JToken/JArray, add a ”using Newtonsoft.Json.Linq” statement to the top of the source file.

Since the twitter API is authenticated, there is a dance that we have to do to obtain an access token given our API key. This involves redirecting to twitter, allowing our registered application to have access to twitter data, then redirecting back to our page. I am omitting that code because it’s not really the point of this post.

Once you go through all of that mess, the code is short and sweet. Put this in the same file as your click event handler (probably Default.aspx.cs):

  private oAuthTwitter _oAuth;
  private JArray GetTwitterUsers() {
    string url = String.Format(http://api.twitter.com/1/users/search.json?q={0},
HttpUtility.UrlEncode(Keyword.Text)); string json = _oAuth.oAuthWebRequest(oAuthTwitter.Method.GET, url, String.Empty); return JArray.Parse(json); }

This gives me a bunch of interesting users but I don’t have a score or expected number of tweets. We can estimate the number of tweets per day by looking at two fields in the GetRelevantUsers result: statuses_count (the number of tweets for the user account) and created_at (when the account was created). Easy.

The last ingredient is the interest scores. A colleague referred me to Klout, which fits the bill nicely. The documentation for the API is here, but it is very simple: pass a comma-delimited list of user names in the query string and you get back the "klout scores". I will use those as our interest scores. Using the same raw materials as for the twitter REST API, the code looks like this:

  private JToken GetKloutScores(IEnumerable<string> users) {
    string uri = "http://api.klout.com/1/klout.json?key={0}&users={1}";
    string usernames = HttpUtility.UrlEncode(String.Join(",", users)); 
    string json = _oAuth.WebRequest(oAuthTwitter.Method.GET, 
String.Format(uri, _kloutKey, usernames), String.Empty); var result = JObject.Parse(json); return result["users"]; }

Now we have two data sets: one from Twitter and one from Klout. We need to join this information together to get a list of users to pass into SuggestModel. We can do that in a single LINQ Join statement. Again, skipping some OAuth details, here is the code:

  protected void Suggest_Click(object sender, EventArgs e) {
    if (_authorized) {
      if (SetAccessToken()) {
        // todo: read values for these from the TextBox controls.
int minSuggest = 1, maxSuggest = 10, tweets = 100; var twitter = GetTwitterUsers(); var klout = GetKloutScores(twitter.Select(token => (string)token["screen_name"])); IEnumerable<User> data = klout.Join( twitter, k => (string)k["twitter_screen_name"], t => (string)t["screen_name"], (k, t) => new User { Name = (string)t["screen_name"], InterestScore = (double)k["kscore"], TweetsPerDay = (int)t["statuses_count"] /
DaysBetween(t.DateTime("created_at"), DateTime.UtcNow) }); var suggestedUsers = SuggestModel.Solve(data, minSuggest, maxSuggest, tweets);
// todo: do something with the output.
}
}
} private double DaysBetween(DateTime start, DateTime end) { return (end - start).TotalDays; }

A braindead way of displaying the output is to drag a Literal control on the form and  write the output as HTML. We can replace that last “todo” with this:

StringBuilder build = new StringBuilder();
build.AppendFormat("<h2>{0}</h2>", "Suggestions");
foreach (var user in suggestedUsers) {
  build.AppendFormat("<div>{0}, tweets/day = {1}</div>", 
HttpUtility.HtmlEncode(user.Name), user.TweetsPerDay); } apiResponse.Text = build.ToString();

I have made major, major shortcuts all the way along: the error checking is poor, I could consider caching information rather than hitting twitter & Klout each time, the interest scores are not necessarily perfect, the UI and logic is all mixed together, existing follows are not considered, the app should itself be exposed as a REST service, and so on. Even with all the simplifications I have made, a lot of glue code is necessary to tie all of this together. This obscures the relative simplicity of the model, which is a shame.

Author: natebrix

Follow me on twitter at @natebrix.

3 thoughts on “O.R. and social networking: A Solver Foundation MIP model to suggest Twitter followers”

  1. Nate,

    Nice example – I had not looked into the solver set-oriented syntax until now, but will, it is very elegant. Two quick questions:

    1) what is the benefit of declaring Parameters like score, as opposed to working directly off the property of the objects, doing something like
    Model.ForEach(users, user => choose[user] * user.InterestScore)?

    2) in the last line, “return data.Where(user => choose.GetDouble(user.Name) > 0.01);”, why 0.01 and not 0d? Is this to avoid potential double/int conversion loss of precision problems?

    Cheers, and keep up with the awesome work with the MSF,

    Mathias

  2. Hi Mathias,

    Thanks for the message.

    1. In this case, there may not be too much benefit. If you use the same terms in multiple goals or constraints, there is some advantage in using a Parameter since you are guaranteed that the data binding will occur only once.

    2. Just being careful – it’s supposed to be 0 or 1 but in many cases solvers seem to return a very small positive value for 0.

  3. I have tried a similar code , but with solver foundation , the thing is that the code runs fine on a development machine , but the moment the solution is hosted in IIS , the
    context.Solve(); method suddenly hangs (runs for an eternity with no result) for few scenarios , I am still wondering why it happens.

    Have you encountered such behavior with the Solve function in solver foundation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s