The Skills Everyone Is Teaching Are Not Enough
AI engineer roles are paying north of $300K right now. The conventional advice for landing one: learn prompt engineering and RAG. That advice is incomplete. After coaching over 200 people into AI and machine learning roles, Marina Wyss , senior applied scientist at Twitch , has a different view of what actually separates candidates who get those roles from the ones who do not.
Skill 1: Testing
Nobody ships production code without tests. The reason is simple: code fails in dozens of ways, and tests catch failures before they reach users.
AI is worse. LLMs are non-deterministic by design. The same prompt returns a slightly different answer every time. The failure modes are nuanced, inconsistent, and hard to reproduce. Most people building with AI skip testing because it feels harder than software testing. The engineers who get hired at $300K are the ones who figured out how to test AI systems rigorously anyway.
Skill 2: Evaluation
How do you know your AI system is getting better? Benchmark it. Not just once, but continuously. Production AI engineers build evaluation pipelines that run automatically when models change, when prompts change, or when the underlying data shifts.
Without evaluation, you are guessing. With evaluation, you know. The difference between a system that improves over time and one that quietly degrades is almost always whether someone built the evaluation infrastructure.
Skill 3: Context Engineering
Context engineering is organizing what the model sees at every step. Not just the initial prompt , the system prompt, tool definitions, results from previous tool calls, conversation history, all of it. Strong context engineering is what separates AI engineers who can build high-quality production systems from the rest.
Most learning resources skip this. They cover how to write a prompt. They do not cover how to architect a multi-step agent so that the context at step 12 still contains the information the model needs from step 3.
Skill 4: Observability
When something goes wrong in production , and it will , you need to be able to see what happened. What prompt was sent? What did the model return? What tool calls were made? What was in the context when the error occurred?
AI systems are harder to debug than traditional software because the failure is not a traceback , it is a subtly wrong answer delivered with confidence. Observability tools for AI let you trace that answer back to its cause. Building this infrastructure is unglamorous. Debugging production AI without it is a nightmare.
Skill 5: System Design
The skill that wraps all the others: knowing when to use AI and when not to. Knowing which part of a workflow benefits from an LLM and which part should be deterministic code. Knowing how to compose tools, models, and traditional software into a system that is reliable, observable, and testable.
The $300K roles are not looking for someone who can write clever prompts. They are looking for engineers who can build systems that work in production, degrade gracefully, and improve over time. That is a systems design problem with AI as one of the components.