Archive
Software Engineering in Large Organizations: Requirements
Figuring out what the heck it is you are going to do is the most important part of any project, isn’t it? It’s the case whether you are fixing a toilet (which I outsource) or building the next version of a software package. As a team lead I always wanted everyone on the team to be able to describe in couple of sentences what it is that we are trying to achieve, and their role in it.
Requirements are invented things. This everyone knows, but at small companies people tend to forget to write them down, and at big companies people tend to forget that they can be changed. Either way, it’s important to keep in mind that requirements must be clearly articulated otherwise they will surely not be met. In other words, we need to write down requirements so that we will be able to determine whether we have met them. Once they’re written down, requirements documents can start to seem like stone tablets. Requirements do not exist for their own sake, but to realize a larger vision. If we can’t explain why a requirement is a requirement, then something has gone wrong. It’s common to question requirements later on in a project – perhaps they are too hard to implement. That’s okay to some extent, but in many cases people are just a bit too sloppy about thinking about (and writing down) requirements at the start of a project.
Where do requirements come from? We already said they are invented, but how do we invent them? Big companies draw from many sources:
- User surveys
- Instrumentation
- Senior management
- Customer advisory boards
- Partner feedback
The “Building Windows 8” blog is a wonderful public example of such artifacts. The B8 blog gives a true sense of how requirements are developed for a “big league” project. Basically the idea is to be like a five year-old: keep asking why questions until you get to axioms.
Also notable is the Russian doll-like nesting of requirements inside of requirements. Even a much smaller project like Solver Foundation cannot simply have one requirement document. Requirements are typically written down in increasingly narrow form: from a vision document to scenarios to themes to features to specifications. A specification, or “spec”, is the fundamental unit of requirements at a larger software shop. It describes a unit of work specified by a single PM, implemented by a single developer, and tested by a single tester during a single product release (or milestone).
Specs should:
- provide justification for the feature
- state goals
- state non-goals
- define user scenarios
- imply a self-contained unit of work
- specify integration points
- describe performance goals
- be written for the engineering team
- be self-contained
They should not:
- describe how a feature is built
- be a list of APIs
- be written in “business speak”
- be written for management
Sad to say, it is often the case that a spec ends up beginning with an amateurish mishmash of MBA gobbledygook and ending with a hastily cut-and-pasted set of API signatures, with comments from ten different people in the margins. Few things are less useful and more depressing than a spec of this kind. Be clear, don’t try to impress, and justify your reasoning. Write for someone who is smart but is not intimately familiar with your product and team history. After all, those are the kinds of people who will be using the thing you’re trying to build.
NCAA tournament prediction using SAS
I don’t watch as much basketball as I used to, so the past few years I have been writing programs to do my picks. Two years ago I did very well, and last year was a complete disaster. This year since I am in a new role at a new company, I chose a new language and a different approach.
I gave myself two hours to write SAS code to generate picks. This was something of a challenge because I am an inexperienced SAS programmer! As in past years I am using simple season-level statistics for all of the teams in the tournament. I obtained the following data:
- Win-loss records for each team.
- RPI rating.
- Strength of schedule.
- Points scored per game.
- Points allowed per game.
- Number of games.
This year’s twist is that I am also incorporating the “wisdom of the crowds” by using the percentage that each team was selected to win by all of the entries on ESPN.com. (Here is the link.)
For each round, I compute a score for each team. The team with the higher score advances. Here’s how I arrive at a score:
- Compute the”adjusted points per game” and “adjusted points allowed per game” by scaling points for and against by the average points per game in all NCAA games this year. This means that if a team scores on average 50 and allows 45, their adjusted PPG will be something closer to 75-67.
- Using adjusted PPG, use the “pythagorean formula” with an exponent of 11.5.
- Add this to the RPI. I call this the “base score”. Everything up until this point is similar to my entry from two years ago.
- When computing the “score” for a team in a round, if the team was judged to be in the top K of brackets on ESPN.com then we multiply the score by 100. We start K at 16 for the first round (with 64 teams) and decrease it by round. This means that if the base score for a team Ais lower than its opponent B but the public strongly believes that A will advance, we will ignore the base score and choose A.
Once the tournament starts you should be able to view my picks here: http://tournament.fantasysports.yahoo.com/t1/2239853
Unfortunately, these picks are INCREDIBLY BORING: all of the #1 and #2 seeds advance to the Elite Eight. I am somewhat disappointed it turned out that way, but that’s the way the code worked out. Final Four is Kentucky, Michigan State, Syracuse, North Carolina with Kentucky beating Syracuse in the final. A couple of other observations:
- Most of the code is data preparation. The raw data files referenced in the code are screen scraped from espn.com and ncaa.com. You can download them below – rename the extensions to csv to use. I did a little bit of massaging of the data files, but I tried to do most of that in SAS.
- I intentionally tried to avoid macro language. I didn’t worry about trying to factor the code nicely (as you can see by the repeated “team canonicalization” code).
- I found the code to read in the more unstructured data files a bit tedious to write. I could have coded it much faster in C#.
- I probably should have left out the bit about RPI. That would have made the picks more interesting. RPI is already a prediction model so we essentially have a “model of a model”.
- I wanted to add a bit more code at the end that parses bracket.csv (which is a list of all the teams with opponents grouped next to each other as in the bracket) and computes the winners for each round. This is not that hard to do (I don’t think), but I ran out of time. A SAS person could whip this up easily, but as a C/C++/C# guy I still tend to think about this procedurally (and would want to just write a few for-loops).
- PROC MEANS is awesome.
- In order to continue to make this interesting I will need to consider per-game data rather than per-season data. Unfortunately that takes more time and I generally don’t have a lot of that to spare.
Here’s the code:
%macro DoPicks;
data espnpicksraw;
array r{6} $ 50;
infile ‘c:\temp\ncaa\espnpicks.csv’ dlm=‘,’;
input r{1} r{2} r{3} r{4} r{5} r{6};
run;
data espnpicks;
length Pct 8 Round 3;
set espnpicksraw;
array r{6} r1-r6;
do Round = 1 to 6;
* example: “1 Kentucky – 99.4%” *;
*Seed = scan(r{i}, 1, ‘ ‘);
Pct = compress(scan(r{Round}, -1, ‘ ‘), ‘%’);
first = find(r{Round}, ‘ ‘, 1);
last = find(r{Round}, ‘-’, -length(r{Round}));
Team = substr(r{Round}, first, last – first);
Team = compress(strip(tranwrd(tranwrd(upcase(Team), “SAINT”, “ST”), “STATE”, “ST”)), ‘.’);
output;
end;
keep Team Round Pct;
run;
proc sort data=espnpicks;
by Team Round;
run;
proc transpose data=espnpicks out=espnscores(drop=_Name_) prefix=Round;
BY Team;
VAR Pct;
ID Round;
run;
%local r ;
%do r = 1 %to 6 ;
proc sort data=espnscores;
by descending Round&r.;
run;
data espnscores;
set espnscores;
Rank&r.=_N_;
run;
%end;
data espnscores;
retain Team Round1-Round6 Rank1-Rank6;
set espnscores;
run;
proc sort data=espnscores;
by Team;
run;
data rpi;
infile ‘c:\temp\ncaa\rpi.csv’ dlm=‘,’ dsd;
length Rank 3 Team $ 30 Wins 4 Losses 4 RPI 8;
length SOS 8 SOS_Projected 8 SOS_Rank 8 SOS_Projected_Rank 8 Conference $ 25 Schedule $25;
input Rank Team Wins Losses RPI SOS SOS_Projected SOS_Rank SOS_Projected_Rank Conference Schedule;
Team=compress(strip(tranwrd(tranwrd(upcase(Team), “SAINT”, “ST”), “STATE”, “ST”)), ‘.’);
keep Team Wins Losses RPI;
run;
data points_off;
infile ‘c:\temp\ncaa\points_off.csv’ dlm=‘,’ dsd;
length Rank 3 Team $ 30 Games 4 WinsLosses $12 PointsOff 8 PPGOff 8;
input Rank Team Games WinsLosses PointsOff PPGOff;
Team=compress(strip(tranwrd(tranwrd(upcase(Team), “SAINT”, “ST”), “STATE”, “ST”)), ‘.’);
keep Team Games PointsOff PPGOff;
run;
data points_def;
infile ‘c:\temp\ncaa\points_def.csv’ dlm=‘,’ dsd;
length Rank 3 Team $ 30 Games 4 WinsLosses $12 PointsDef 8 PPGDef 8;
input Rank Team Games WinsLosses PointsDef PPGDef;
Team=compress(strip(tranwrd(tranwrd(upcase(Team), “SAINT”, “ST”), “STATE”, “ST”)), ‘.’);
keep Team PointsDef PPGDef;
run;
data bracket;
infile ‘c:\temp\ncaa\bracket.csv’;
input Line & $50.;
first = find(Line, ‘ ‘, 1);
last = find(Line, ‘(‘, first);
Team = substr(Line, first, last – first);
Team = compress(strip(tranwrd(tranwrd(upcase(Team), “SAINT”, “ST”), “STATE”, “ST”)), ‘.’);
ID = _N_;
keep ID Team;
run;
*****;
proc sort data=rpi;
by Team;
run;
proc sort data=points_off;
by Team;
run;
proc sort data=points_def;
by Team;
run;
proc sort data=bracket;
by team;
run;
* Get data for the 68 tournament teams. *;
data team_stats;
merge bracket rpi points_off points_def espnscores (in=in_espn);
by Team;
if in_espn;
if ID;
run;
* Find the average number of points per game over all teams. *;
proc means noprint data=team_stats n mean max min range std sum;
var PointsOff Games;
output out=ncaa_points(keep=_Points _Games)
sum(PointsOff Games)= _Points _Games;
run;
* The average is in _PPG. I use the value 72.8 below. *;
data ncaa_points;
set ncaa_points;
_PPG = _Points/_Games;
run;
data team_stats;
set team_stats;
array s{6} s1-s6;
array r{6} Rank1-Rank6;
PointsTotal = PointsOff + PointsDef;
* Scale points for and against based on NCAA average. *;
AdjPointsOff = (72.8 * 2 / PointsTotal) * PointsOff;
AdjPointsDef = (72.8 * 2 / PointsTotal) * PointsDef;
Pythag = AdjPointsOff ** 1.5 / (AdjPointsOff ** 1.5 + AdjPointsDef ** 1.5);
* The base score. *;
Score = Pythag + RPI;
do Round = 1 to 6;
* If this team was very popular on ESPN.com, jack up the score. *;
if r{Round} <= max(2**(5-Round),4) then s{Round} = 100 * Score;
else s{Round} = Score;
end;
run;
proc sort data=team_stats;
by id;
run;
%mend;
%DoPicks;
Software Engineering in Large Organizations: Software Lifecycle
(This is part two in a series.)
The software lifecycle is the same wherever you go. It’s one of these things that you are taught in school that really is the way it’s described. The steps are basically to figure out what you need to do (requirements), how it will work (design), do it (implementation), and verify you’ve done it right (testing). Then you’ve got to get the finished product to the people you promised to get it to (deployment). When you lay it out sequentially it feels very much like a waterfall development model, but of course all the same steps happen in agile as well. There’s been a big move towards agile over the past ten years and big companies are no exception. You do encounter a lot of “faux agile” as well (fauxgile?) – a team I was once a part of had one hour “scrum meetings” with 15 or 20 managers with laptops sitting down in a room. I digress.
At a big operation, each stage is documented according to organizational or team standards. This is a good idea, and it’s vital if you want to be able to share institutional knowledge among past and future team members, seed the localization and user documentation teams with good information, and form the genesis of patent applications.
Accountability for different stages in the software life cycle is divided among the team. Teams are large enough to permit specialization. At Microsoft (and other places besides), the three primary job descriptions are “program manager”, “developer”, and “tester” (or QA). Program managers are accountable for requirements, developers for implementation, and QA for testing. All three disciplines are involved in all stages, but each discipline takes its turn in the spotlight as concepts move from vague notions to concrete implementation. Different outfits divide these responsibilities in different ways. The concept of a “program manager” (as opposed to project manager) was essentially invented at Microsoft and is not universal. Some teams combine dev and QA responsibilities. Other teams include operations (accountable for deployment) in the core engineering team.
This separation of powers feels like the division that exists between legislative (PM), executive (dev), and judicial (QA) in government. As in government, tension sometimes exists between the three branches. Some amount of this is natural and healthy because after all, software engineering is an activity that is undertaken with limited resources under changing conditions. Tradeoffs are necessary, and figuring out how and when to make these changes naturally leads to difference of opinion. One difference between engineering teams and governments is that in an engineering team there is a fourth party sitting above all the others: management. Management, if it is to be useful, should step in when necessary to remind all three disciplines of their common mission and purpose, and to make the judgment calls that are necessary to keep them on track. They’re in a good position to do that when the mission is clearly defined, they can articulate it, and when they can relate it to the day-to-day work that their team is being asked to do. (Knuth: “the psychological profiling of a programmer is mostly the ability to shift levels of abstraction, from low level to high level. To see something in the small and to see something in the large.”)
I don’t know about you, but I’d rather be a president than a legislator or a judge. I always liked being a dev. It’s common for the devs to feel like they are special – I remember a conversation early on in my Microsoft career where a more senior dev told me that devs were special because they were the only ones that could perform the other two job functions. I have found that it is not really true – a great PM could be a dev or a tester, and a great tester could be a PM or a dev. This makes sense because in order to do your job well, you need to understand how the work you do fits into the larger story. It is common for Microsoft employees to change from one discipline to another in the course of their careers.
A “triad” of PM, dev, and tester form a basic unit that can take a portion of a product (a feature) from start to finish. It’s become more common in certain divisions at Microsoft to make this partnership more formal by calling this triad a “feature crew”. They meet regularly from the inception of the project to the very end, reviewing each other’s work and tracking its progress together. Opinions vary on whether formal feature crews are a good idea or simply bureaucracy, but I liked them. Camaraderie develops between the triad, which is enjoyable and effective.
Next time (whenever that is) I will talk a bit about the first stage of the process: requirements.
Software Engineering in Large Organizations
This afternoon I gave a talk at the University of Iowa ACM conference, where I spoke about software engineering in large organizations. It’s a topic I enjoy speaking and writing about, and I was particularly enthusiastic because the audience was mostly undergrads and grads in the CS department. I tried to resist the temptation to simply tell “war stories” for 75 minutes and I almost succeeded.
The premise behind the talk is that in a healthy organization, team success and professional development go hand-in-hand, but in practice the reality often differs from the ideal. A key to success and professional fulfillment is to build hard and soft skills that allow you to achieve team goals as well as individual growth in the face of these realities.
Team success and individual professional development are clearly beneficial to everyone involved, and not at all impossible to achieve simultaneously. An organization that makes long term investments in talented people working on clearly defined goals, extending the opportunity to accept and conquer big challenges is likely to succeed on both counts. (Not incidentally, it’s impossible to pull this off without creating a positive, encouraging, lively work environment.) Unhappily, it’s often the case that organizations collectively apply tactical thinking to ill-defined or changing goals, leading to poorly managed projects with periods of tedium followed by “death marches”, caffeine and cold pizza.
Employees of large organizations are not unique in facing these challenges, but the impact may be more acute because simple math says that they are likely to have less control over their professional environment. The serenity prayer comes to mind. Nevertheless, large organizations provide tremendous advantages. Big companies draw outstanding talent and are able to provide them all the tools they need to do their job. There’s a lot going on – the diversity of interesting, relevant projects at Microsoft continued to inspire and amaze me year after year. The same is true at Nielsen, and other companies. Big companies tend to have formal processes in place for employee evaluation and development. They can be great places to learn new skills, be they technical or interpersonal.
In order to understand why software development at a big company is different from a startup, you need to think about what the job requires. Software development is a creative activity requiring engineering discipline. There are huge differences in the skill level of coders, even out in the professional world. Ubercoders do exist. But that is not the key factor determining success. I like to think of software engineers in terms of the following axes:

No matter where you work, you want to be a pro – in the upper right hand corner. Engineering discipline and creativity can absolutely coexist – and the traces of both are plainly evident in technology that truly inspires, be it the iPad, the Kinect, or whatever. Engineering discipline is simply more important (in a relative sense) than in smaller organizations, and for sound reasons. Large companies have different considerations. Big companies have big teams working together towards common goals. They all must march and work together. The cost of failure is often higher. Typically the team needs to support a large past body of work, such as a previous version. Mistakes can have consequences that last years. The list goes on.
A potentially uncomfortable but eminently fair example is with Windows Vista and Windows 8. I need to be careful here: I have never been a part of the Windows team and I don’t know anything that you don’t, and even if I did I would not reveal anything about a company that I no longer work for but still root for. Furthermore, Windows 8 has not shipped, and opinions may differ as to whether it will be enough of a success to stave off Apple, etc. But – I am sure that when it ships, Windows 8 will be a precise embodiment of a vision that was laid out years ago; that the choices that were made will be able to be justified with data; that it will ship on time and with high quality; that it will be something that the team who built it will be exceedingly proud of. It is a matter of public record that this was not the case with Vista. Why the difference? I will tell you what is not the case. It is not the case that Microsoft fired all the Vista people and hired new programmers (although there is new leadership). It is not the case that the existing team became significantly better programmers, testers, and designers. It is not the case that there were was a lack creative, interesting ideas during Vista development. (Au contraire, mon frere.) What happened was that the team realized a collective understanding of the importance of engineering discipline in all phases of the software cycle. Change is hard and not without cost, as is described in the recent Business Insider article on Microsoft (check it out). But I am sure that the costs will be repaid many times over by the work that has resulted from these changes. I am sure that many of those who stuck with the changes are better software engineers too.
This was the message behind the first portion of the talk. In the remainder of the talk I walked through the software lifecycle, giving my best account of how things work in a big organization and doing my best to explain why, with an aim towards identifying skills and techniques that improve one’s game. As time permits, in future posts I will share some of my thoughts from the remainder of the talk.