Autocompletion with deep learning

July 15, 2019

Update (August 19): We’ve released TabNine Local, which lets you run Deep TabNine on your own machine.

TL;DR: TabNine is an autocompleter that helps you write code faster. We’re adding a deep learning model which significantly improves suggestion quality. You can see videos below and you can sign up for it here.

There has been a lot of hype about deep learning in the past few years. Neural networks are state-of-the-art in many academic domains, and they have been deployed in production for tasks such as autonomous driving, speech synthesis, and adding dog ears to human faces. Yet developer tools have been slow to benefit from these advances. To use a surprisingly common idiom among software blogs, the cobbler’s children have no shoes.

TabNine hopes to change this. Python:

Java:

C++:

Haskell:

About Deep TabNine

Deep TabNine is trained on around 2 million files from GitHub. During training, its goal is to predict each token given the tokens that come before it. To achieve this goal, it learns complex behaviour, such as type inference in dynamically typed languages:

Deep TabNine can use subtle clues that are difficult for traditional tools to access. For example, the return type of app.get_user() is assumed to be an object with setter methods, while the return type of app.get_users() is assumed to be a list:

Deep TabNine is based on GPT-2, which uses the Transformer network architecture. This architecture was first developed to solve problems in natural language processing. Although modeling code and modeling natural language might appear to be unrelated tasks, modeling code requires understanding English in some unexpected ways. For example, we can make the model negate words with an if/else statement:

The model also uses documentation written in natural language to infer function names, parameters, and return types:

In the past, many users said they wished TabNine came with pre-existing knowledge, instead of looking only at the user’s current project. Pre-existing knowledge is especially useful when the project is small or a new library is being added to it. Deep TabNine helps address this issue; for example, it knows that when a class extends React.Component, its constructor usually takes a single argument called props, and it often assigns this.state in its body:

Deep TabNine can even do the impossible and remember C++ variadic forwarding syntax:

Using Deep TabNine

Deep TabNine requires a lot of computing power: running the model on a laptop would not deliver the low latency that TabNine’s users have come to expect. So we are offering a service that will allow you to use TabNine’s servers for GPU-accelerated autocompletion. It’s called TabNine Cloud, it’s currently in beta, and you can sign up for it here.

We understand that many users want to keep their code on their own machine for privacy reasons. We’re taking the following steps to address this use case:

  • For individual developers, we are working on a reduced-size model which can run on a laptop with reasonable latency. Update: we’ve released TabNine Local.

  • For enterprises, we will offer the option to license the model from us and run it on your own hardware. We can also train a custom model for you which understands the unique patterns and style within your codebase. If this sounds interesting to you, we would love to hear more about your use case at enterprise@tabnine.com.

If you choose to use TabNine Cloud, we take the following steps to reduce the risk of data breach:

  1. TabNine Cloud will always be opt-in and we will never enable it without explicitly asking for your permission first.
  2. We do not store or log your code after your query is fulfilled.
  3. Your connection to TabNine servers is encrypted with TLS.
  4. There is a setting which lets you use TabNine Cloud for whitelisted directories only.

TabNine Cloud is currently in beta, and scaling it up presents some unique challenges since queries are computationally demanding (over 10 billion floating point operations) yet they must be fulfilled with low latency. To ensure high service quality, we are releasing it gradually. You can request access here. Customers of TabNine will be the first to receive access.

Frequently asked questions

This deep learning stuff is cool but I’m skeptical that it can improve over my existing autocompleter which actually parses the code.

You can use both! TabNine integrates with any autocompleter that implements the Language Server Protocol. TabNine will use your existing autocompleter when it provides suggestions and use Deep TabNine otherwise.

What latency can I expect?

You can look at the videos (1, 2, 3, 4) for an idea of the latency. They haven’t been edited or sped up.

What languages are supported?

Deep TabNine supports Python, JavaScript, Java, C++, C, PHP, Go, C#, Ruby, Objective-C, Rust, Swift, TypeScript, Haskell, OCaml, Scala, Kotlin, Perl, SQL, HTML, CSS, and Bash.

Software licenses

Only code with one of the following licenses is included in the training data:

  • MIT
  • Unlicense
  • Apache 2.0
  • BSD 2-clause
  • BSD 3-clause

Licenses are determined per-repository by Licensee.

Acknowledgements

Thanks to everyone who gave feedback on this blog post, and thanks to OpenAI for open sourcing GPT-2.