“API” is a ubiquitous term in modern application development. Technically, it stands for “Application Programming Interface,” but nobody calls it that. It’s a broad term, but it basically defines how one system communications with another system. I’ll divide it into two subcategories: third party APIs and internal APIs.
Third Party APIs
You probably hear about third party APIs when your engineers are talking about accessing information from another company. All the cool companies have APIs these days that allow outside developers to access a portion of their data. APIs may allow your engineers to read data (eg. get a list of a user’s Facebook friends), write data (eg. post to a user’s wall on Facebook), or both.
A company’s API defines what outside engineers can and can’t do with that company’s data. Just because a company has an API doesn’t mean it has an API that does what you really want it to do. Facebook provides an API to post to a user’s own Facebook page, but will not let you automatically post to their friends’ pages. There’s no way around this: if the API doesn’t allow some functionality, your engineers can’t just make it happen (“can’t we just…” is a common refrain I hear from non-engineers). Companies naturally want to protect their intellectual property, and some of their data might be their core competitive advantage. For-profit companies rarely invite the fox into the hen house and let them run loose.
There are also non-technical limitations to third party APIs, typically in the form of legalese. For example, Yelp’s API is pretty liberal in letting you access its rich business data, but it has strict rules around storing it. They basically don’t want you to recreate their entire database on your own and cut them out of the picture. While an engineer can store it as long as they want from a technical perspective, by accessing the API, your company is legally agreeing to abide by their rules. There might also be attribution rules; that is, you must say “Powered by Yelp” with their logo if you use their data. Short of suing you – which they could do – the API provider could cut you off if you get found out so that you can no longer access their API at all. Since they control their own API 100%, you are at their mercy.
Why do companies have APIs in the first place? There are a few reasons. Some companies charge for their APIs, perhaps after you exhaust their free tier. This could be their entire business model. Other companies are in it for the attribution; seeing “powered by Yelp” or Facebook “like” buttons all over the web bolsters their respective brands. Some non-profits and government organizations provide APIs for charitable purposes. Some use it to bolster their own product, such as encouraging more posts on Facebook’s platform through an accessible API. Finally, some do it to get YOUR data. For example, Facebook’s login API gives them an immense amount of data as it allows them to know who is using your service and when they are using it.
It is a good idea to be prudent when building core functionality that is dependent on a third party API because, again, you are at their mercy. When Twitter was young, it provided a powerful API that spurred the creation of many apps and companies. But they later decided to scale back some of their API, killing off some of the businesses that depended on it. Facebook is also known for changing its API frequently. When I was at Coffee Meets Bagel, Facebook decided to change one of their APIs that we needed in order to show users which friends they had in common with their matches, which was a core part of our brand. We worked with Facebook over many months and eventually they provided a partial solution. If they hadn’t built a new API for us (and other companies with similar functionality), we would have had to kill this feature.
So, there’s no API to that provides the information you need, but you can see it right there on the internet. Can’t your engineer just write a program to obtain that information in an automated way the same way you, a human, would?
This is known as page-scraping, and it is sometimes an option. But there are some big limitations:
- It’s significantly more work than integrating an API. APIs are designed to be easily accessed by computer software, whereas websites are designed to be easily accessed by humans. Engineers have to reverse engineer the webpages; that is, study webpage’s code and figure out the how to obtain the info they need.
- If the webpage changes, your app breaks. Your favorite webpage applied a fresh coat of paint? The new design looks awesome, but it broke the scripts that relied on the old structure. Page scraping is highly sensitive to structure; even a small change could break your page scraper. While an API can also change could break your application, it’s less frequent and you usually get some warning. Since APIs are designed for such purposes, the owners usually give a heads up to anyone who might be using it.
- The information might not really be publicly available. While you can log into Facebook and see lots of content from your friends, you can’t see information from some guy you don’t know. Neither can a computer program. While perhaps the scraper could log in as you, it would only be able to see the same information that you can see, which is probably not what you want for a scalable business. As a rule, if you have to log in in order to see content, your engineer’s ability to scrape content is likely to be limited.
- You could get blocked or rate-limited. It’s pretty easy for the hosting company to detect that a scraper is scraping their site, so if they don’t want that, they could easily block you, or limit you to only scraping X pages per minute. Even if they don’t mind you accessing their data, scraping a website that has many pages could be taxing on their servers (see Why do engineers worry about scaling?) so they might block your scraper to prevent you from taking down their site.
Page scrapers are fragile, but if an API doesn’t exist, it may be your only option. If these limitations are acceptable and you go this route, it’s likely to require a lot more maintenance to keep them working than an API.
Even if your company doesn’t provide third parties with access to your data, you may have an internal API that is only for your own team’s use (it doesn’t mean other companies can access your data). A common example is an API that your frontend (mobile application or web frontend) uses to communicate with your backend. This is essentially a contract between your frontend and backend teams: the backend team promises that when the frontend invokes a certain API, it will perform a certain action. A “user get” API will return details about a specific user. An “article post” API will create a new article in the backend database with the details sent by the frontend to the API.
While not as rigid as third party APIs since your company owns it, your internal API is probably something your engineering team thinks about carefully. As the contract between your engineering teams’ software, it often makes up a large portion of the discussion amongst your team. Changes to the API are a bit more involved than many other changes, so engineers tend to put more effort into getting it right the first time and making it flexible enough to handle future features with relative ease.
Though your engineers have the ability to change your API however you want, it’s generally not that simple. For example, lets say you collect a user’s location in order to provide tailored content for that region. Your “user get” API returns the user’s location and the “user put” API lets the frontend update their location whenever the user updates their settings.
But now you want to allow a user to enter multiple locations (say, a home and a work address). Both the frontend and the backend need to make changes to update the location field from one value to multiple. But what happens if the backend updates the API before the frontend has been updated, or vice versa? Unless care is taken, your app will stop working. This especially problematic for client-side applications like mobile apps, where the user may not update their app immediately. Without handling this case, all old clients will break.
This can really complicate the backend code quite lot, and often engineers will want to limit how long they will support the old versions of the app or how many old versions they will support at one time. Depending on your company, the engineering cost of the supporting the last 5 versions of the app may not be worth the value you’re getting from keeping the app working for the 1% of users who don’t upgrade the app for 3 months.