I have a brilliant idea!!! What if we could do something like X but for industry Z?!
After researching a lot, we found a real pain the industry has that we’re sure we can solve with the right tech..
This is an example of how a startup is born.
After the initial market research and receiving funding, it’s time to get our hands dirty and develop our product. This article is going to cover:
- The different stages of security at a startup.
- Solutions for scaling up
- Teaser for the system we are building
STAGE 1: At first we are in our honeymoon phase, let’s build the POC and show the market what we can do! Security? Let’s put it in later after we survive.
STAGE 2: Let’s call a friend who knows security to be assured what we are saying is correct. Our security friend will look at our architecture design, ask some questions and say he needs more time in order to really understand this. For now this is enough and we can continue.
STAGE 3: A big client is asking for certain security certifications. We will hire a person to handle this noise and allow us to focus on growing the startup.
STAGE 4: More security issues start piling up and we understand the need for better security inside the company. The security team has much to do in addition to handle lots of legacy problems in previous stages.
STAGE 5: As our product gets used by more and more people, we also pile up many more bugs, feature requests and sales requests. We understand the importance of security, but we have to balance security requests with all these other requests.
We are outnumbered!!!
In reality the number of security personnel we have compared to the total engineers will always be a fraction. Even in the most secure and security-minded startups and companies.
This will always come down to risk analysis and decision-making of what’s best for the company.
How can we help our new security team scale up?
It starts with understanding that the security team needs to be a real and integral part of the product, with the proper resources and support for its efforts, but also buy-in and trust from the product and engineering teams. We need to build a relationship that is accepting and inclusive of security into the development process.
But most importantly, is to properly define and understand the process of pushing code into production.
There are different types of companies with different engineering practices and needs:
- Some engineering teams will push daily changes to production. Companies practicing CI/CD and rapid deployments need vast automation to support this and decide what happens if a security tool finds a known vulnerability, security bad practices or harder decisions such as a hint to an unproven risk of a vulnerability.
- Some teams will have defined points in time they can push to production; usually at the end of a sprint. This allows for more time, enabling security teams to validate better, but do they have the capacity to validate it all?
- Some will push into canary/staging environments, allowing security testers / beta users time to validate that the system is working properly and does not expose holes. How much time do we give our security testers access? Do we open it for bug bounty programs?
Doesn’t matter what type of CI/CD you choose, we would want all of our code to go through a proper process of validation before getting exposed to the world.
In a near perfect world, code deployed by developers would receive immediate feedback by security tools whether on the developer’s machine or centralized in the CI/CD environment.
Then, when ready, the code will be uploaded to a staging/canary environment where it can be verified as part of the whole ecosystem (this becomes especially important in environments built upon microservices architecture). Ideally it should be tested by some sort of dynamic testing, some fuzzing, and common well-known attack scenarios.
Next we need to verify that all of the end-to-end QA suite testing passes and is recorded in order to receive indication of real data flowing in the systems. This can be in-memory, instrumentation hooks, network recordings or any other solutions which are relevant for collecting behavior of our code under simulation.
All the data collected above is supposed to give us a better understanding of the new risk calculated for the service. Because we cannot anticipate in advance all of the possible security flaws simply through pure automation, the last part of the process is deciding if and where we should add manual testing before the release of the service.
When do we add manual testing?
- If there is a new endpoint exposed or large changes to an existing endpoint
- Input is received from outside entities that touch a new database, microservice or any other data source which can affect the system
- For example – /?user=rotem → ServiceA -> ServiceB -> “SELECT …”
- Increased threshold of new functions running because of this endpoint invocation
- Usually 3-4 new functions found in a flow means there were significant changes to the codebase
- Major vs. Minor version upgrades
- 4.5.1 -> 4.6
- Security Code Leads / False Positives –
- Usually from my experience some “False Positives” found are real security leads with a good indication of true vulnerabilities waiting to be triggered, which systems could not assess automatically.
- By giving your security researchers the proper access to all information and an easy way to investigate and test new services, we can eliminate security bugs before they happen.
Bringing it all together
Most solutions today look at Static, Dynamic and Instrumentation Analysis as three separate solutions which uncover different types of findings. We believe that these three solutions should work closely together in a seamless way, combining the results and all the positive and false-positive findings from the different engines to provide a more holistic picture of the security posture.
The IAST (instrumentation tracing) piece is crucial. Learning how all of your services work together and interact with each other as well as internally inside the codebase, will give you the exact flow of every finding from the other engines.
This type of methodology will answer questions such as what are all the endpoints that lead to this specific finding, or whether to run a task for our DAST engine to thoroughly scan a specific endpoint to self-validate.
Another advantage of combining the engines is now we can see entire areas that were completely missed or overlooked by the scanners and automation in place. If we see findings in the static analysis engine, yet we never saw any traffic going through the service, this will flag for us that the dynamic engines never even tested this service, and we can then request from QA to add the correct testing flows to this specific endpoint to provide more accurate and comprehensive security coverage.
As a startup Appsflyer decided to use more niche technologies such as Clojure, which currently SAST, DAST and IAST vendors do not support yet.
In order to tackle this gap, we are internally building a system to provide a solution for this pain, by building our own combined solutions that can also handle our scale, which will work not only for Clojure but will be easily adaptable to any language you are using.
Eventually we will also open source the system for usage of other companies wishing to work in the same method and hopefully enjoy outside contributions to the system itself.
Some of the functionality I’ve found in my work as a security engineer that I would need in order to have a holistic view of our projects include:
Service page with exposure info and quick links to all endpoints and to the risk score.
Path of a lead from an external endpoint to the exact line of code where the bug was found
Security rule building dashboard, where security engineers will be able to add more rulesets to be analyzed and caught by all engines (static, dynamic and instrumentation)
I hope you received a glimpse of the complexity of keeping up with code changes and securing large microservice architectures at scale.
This battle is fought everyday in thousands of companies, with delicate balances between bringing the best possible products to market, and keeping up with new technologies, while securing the whole ecosystem.
I would like you to think of what solutions you currently have internally to verify your code is secure, and the next time you are building a startup/product/tool, a personal request – please integrate security scanning solutions into your first line of code.