Home / Blog /
How to use KPIs to measure software development productivity
//

How to use KPIs to measure software development productivity

//
Tabnine Team /
5 minutes /
December 21, 2022

There are lots of reasons to measure the productivity of your R&D team, including understanding the overall level of the team’s performance, developing benchmarks, tracking progress, identifying high and low performers, improving processes and operations, justifying your investments, and determining resource allocation. 

But if you’ve ever attempted to measure your software team’s productivity, you’ve probably run across several snags in the process – from trying to define what productivity actually means, to attempt to identify the metrics that reflect that definition. 

This post discusses the various challenges involved in measuring dev productivity and offers a KPI strategy to help you really understand the performance of your R&D teams. 

Defining software engineering productivity

Before you can even start to measure productivity, you need to define it. Usually, productivity is defined in terms of inputs and outputs, where you divide your output by your input to get your ROI. Using this method, we could, theoretically, measure the input and output of a software developer as follows:

Input:

  • Salary
  • Amount of hours worked

Output:

  • Software features
  • Documentation
  • Deployments
  • Bug fixes
  • PRs

It’s easy to see the problem, since it’s so difficult to measure the outputs themselves. Good measurements of output should have a strong correlation with revenue. However, if we break down the metrics for measuring those outputs, the correlation tends to be pretty weak. 

For example, one common metric for measuring developer productivity is lines of code (LOC). The thing is, for several reasons, more isn’t always better. For example:

  • Developer A might write significantly more lines of code than Developer B, but is it good code? Will Developer A’s code introduce more bugs to the system than Developer B’s? 
  • The more code you have, the more resources you need to maintain it
  • Developer A might be less skilled than Developer B, and could be assigned to less challenging features that have lower customer value

Further, the difference between physical lines of code and logical lines of code adds to the complexity involved in measuring LOC. 

Considering the flaws inherent in defining software development productivity using inputs and outputs, it makes more sense to define it as a measurement of your team’s ability to quickly and efficiently create useful, good-quality software that’s easy to maintain and has high customer value. 

Now that we’ve defined dev team productivity, we can try to figure out how to measure it. 

Identifying the right KPIs

The reality is, there’s no one metric that can be used to assess productivity, since each KPI, on its own, lacks important context. We’ve already discussed some of the problems inherent in measuring lines of code, but there are similar issues involved in all commonly used productivity metrics:

  • Code-based metrics

    In addition to lines of code, other code-based metrics include commits, pull requests, code review turnaround time, code churn (also referred to as “rework), code coverage, closed change requests, and bug fixes. While all of these metrics are possible indicators of productivity, the same issues apply: They’re all missing important context and can be gamed in a variety of ways that almost always sacrifice quality.

  • Velocity

    Velocity measures the amount of work your team can complete during an average sprint. The team assigns points to each story (based on estimated complexity, risk, and repetition), then calculates the average amount of points achieved each sprint over a period of time (minimum 5 sprints). This is one way to track your team’s overall progress and figure out how realistic your team’s goals are.
    However, if we look at how we’ve defined productivity, it’s easy to see how quality can be sacrificed to maintain or improve velocity. Cognitive bias can also play a part by inflating the amount of points assigned to each story.

  • Function points

    Function points are units that measure the functionalities that a software service or product delivers to its end users, meaning what the software can do in terms of tasks and services. The advantage of function points is they are relatively agnostic to the technology model as well as the dev model used. These points are assigned using the Function Point Analysis (FPA) rules, which include 5 components: external inputs (EI), external outputs (EO), inquiries (EQ), internal logic file (ILF), and external logic file (ELF). However, since the assignment of function points is made by the team, they’re subjective and can be manipulated.

  • Pull requests

    Pull request metrics, such as time to merge, lead time, size, flow ration, and discussions, have increased in popularity recently as a way to measure development productivity. However, tracking these metrics doesn’t factor in the effort or impact of the work performed. It can also be unfair to developers who are working on a legacy codebase, compared to developers working on a greenfield project.

  • Sprint burndown chart

    This is a graphic representation of how much work has been completed during a sprint and the total amount of work remaining in the sprint. While good for identifying issues such as scope creep and oversaturation of features, it lacks several important factors as well as the context needed to get the full picture.

  • Evaluations

    Both manager and peer evaluations are a good way to give real context to the performance of your team, however, they are highly subjective and can be subject to abuse or bias.

Combining weighted KPIs to measure your team’s productivity

Every company and development team is different, with its own set of dynamics and processes. Using a combination of the above metrics, then weighting them (and fine-tuning the weighting over time) in terms of their importance to your team and company leadership, is a good way to measure your team’s overall performance, dynamics, and the efficiency of your processes. 

For example, you create a set of criteria, such as correlation with revenue, continuous delivery, high quality, function to user, team cooperation, objectiveness, etc. 

Then score each metric on a scale of 1-5, based on how well that metric meets the above criteria. For example, you might give Pull Request metrics a high score for team cooperation, but a low score for correlation with revenue. While functionality points could score high for revenue correlation, they might score low for objectiveness. 

Once scored, you can weigh these KPIs using the scores, in order to reach a final productivity score that more accurately reflects your team’s performance. In addition, multiple metrics are less subject to abuse and bias than any single metric. 

It’s also important to measure the entire team vs. each individual member, since the true scope of software team productivity is far larger than any one developer on your team. 

Yes, each team member’s success has real value, but since we’re trying to increase the creation of useful, high-quality, easy-to-maintain software, many more factors must be taken into account in order to get a real understanding of your team’s productivity.

Summary

Since single KPIs can be misleading when measuring software engineering productivity, we recommend using a combination of weighted KPIs, which can offer more in-depth and nuanced insights into your software team’s performance. 

About Tabnine

Since launching our first AI code assistant in 2018, Tabnine has pioneered generative AI for software development. Tabnine helps development teams of every size use AI to accelerate and simplify the software development process without sacrificing privacy and security. Tabnine boosts engineering velocity, code quality, and developer happiness by automating the coding workflow through AI tools customized to your team. With more than one million monthly users, Tabnine typically automates 30–50% of code creation for each developer and has generated more than 1% of the world’s code.

Unlike generic coding assistants, Tabnine is the AI that you control:

It’s private. You choose where and how to deploy Tabnine (SaaS, VPC, or on-premises) to maximize control over your intellectual property. Rest easy knowing that Tabnine never stores or shares your company’s code.

It’s personalized. Tabnine delivers an optimized experience for each development team. It’s context-aware and delivers precise and personalized recommendations for code generation, code explanations, and guidance, and for test and documentation generation.

It’s protected. Tabnine is built with enterprise-grade security and compliance at its core. It’s trained exclusively on open source code with permissive licenses, ensuring that our customers are never exposed to legal liability.

Tabnine provides accurate and personalized code completions for code snippets, whole lines, and full functions. The Tabnine in IDE Chat allows developers to communicate with a chat agent in natural language and get assistance with various coding tasks, such as:

  • Generate new code
  • Generating unit tests
  • Getting the most relevant answer to your code
  • Mention and reference code from your workspace
  • Explain code
  • Extending code with new functionality
  • Refactoring code
  • Documenting code
  • Onboarding agent and more

Learn more how to use Tabnine AI to analyze, create, and improve your code across every stage of development:

Try Tabnine for free today or contact us to learn how we can help accelerate your software development.