Reasoning
Planning, tool use, and recovery logic for tasks that change while they are being solved.
ZHT Lab works on reasoning, multimodal systems, and evaluation infrastructure for AI products operating in ambiguous, high-consequence environments.
Research memo
A narrow program across reasoning, multimodal intelligence, evaluation, and operating discipline.
Selective work with founders, operators, researchers, and investors who care about durable AI systems.
Reasoning
Planning, tool use, and recovery logic for tasks that change while they are being solved.
Multimodal
Language, perception, and structured state combined without obscuring operator judgment.
Evaluation
Benchmarks, red-teaming, and review loops that surface failure before deployment.
Operating model
Small senior teams, selective collaborations, and direct accountability for system quality.
Research first
We start with technical questions that matter, not packaging around familiar demos.
Systems view
Models, orchestration, interfaces, and evaluation are treated as one operating system.
Selective scope
Fewer bets, deeper context, and higher standards than generic AI product work.
We work at the point where frontier model capability has to become dependable system behavior.
We work where model behavior, system architecture, and operating reality have to agree.
ZHT Lab stays deliberately narrow. We focus on intelligent systems that need to reason, observe, recover, and remain legible when the environment stops being clean.
That means designing beyond the model alone: architecture, interfaces, control points, and evaluation have to be decided together.
Working posture
Research depth, product realism, and calm execution are treated as one operating discipline.
01
Research depth
We care about underlying behavior, not just the surface demo.
02
Product realism
Ideas are shaped against latency, failure, interfaces, and trust from the start.
03
Calm execution
Small teams, direct feedback, and a higher bar for clarity.
A small set of technical tracks where better research changes how intelligent products are built, measured, and trusted.
01
Planning, retrieval, tool use, and recovery for tasks that change while they are being solved.
Current focus
Memory, control flow, and the boundary between model judgment and explicit logic.
System implication
Systems that remain dependable in dynamic work.
02
Language, vision, and state working together when text alone is not enough.
Current focus
Grounded perception, cross-modal memory, and interfaces operators can still understand.
System implication
Products that can observe, reason, and act without becoming opaque.
03
Benchmarks, review loops, and telemetry that make model quality visible before deployment.
Current focus
Adversarial testing, scenario coverage, and feedback systems that surface failure early.
System implication
A stronger bridge from promising research to credible systems.
The work only matters when it survives translation into architecture, evaluation, and dependable operation.
Operating method
The work spans the stack required to turn promising behavior into dependable operation.
01
Frame the problem against technical and product constraints.
02
Prototype quickly, but measure against explicit standards.
03
Design the system around the model, not just the prompt.
04
Close the loop with evaluation, review, and production feedback.
01
Applied research
Turn frontier model behavior into hypotheses, experiments, and product decisions.
In practice
Structured experiments with explicit success criteria and short iteration loops.
02
System architecture
Design the system around the model: tools, memory, routing, orchestration, and review.
In practice
Architectures that treat model output as one layer inside a larger machine.
03
Evaluation and red-teaming
Build evaluation suites that expose reliability gaps before they reach users.
In practice
Offline benchmarks, adversarial probes, and reviewer loops tied to real failure modes.
04
Agent workflow design
Shape planning, execution, oversight, and recovery flows for complex tasks.
In practice
Interfaces and policies that preserve both autonomy and operator control.
05
Feedback and deployment loops
Feed production behavior back into research through telemetry and structured review.
In practice
A path from prototype insight to measurable long-term improvement.
The institution is designed to preserve technical taste, make sharper decisions, and stay legible as the work gets harder.
Institutional thesis
The company should feel like the systems we admire: precise, legible, and resilient under pressure.
The goal is not to look larger. It is to make better decisions, protect taste, and keep the work understandable as complexity rises.
01
Taste in system boundaries
We know when to automate, when to keep logic explicit, and where review still matters.
02
Research without theater
We prefer hypotheses, measurement, and legible progress over noise and posture.
03
Execution density
Shared context and uncompromising standards let small senior teams move faster.
04
Credibility under pressure
Our systems are designed to remain understandable when inputs, environments, and stakes change.
Dense context, direct feedback, and unusually high standards for people who like demanding technical work.
Operating note
We optimize for dense context, technical honesty, and the ability to turn insight into systems with clear behavior.
01
Small teams, dense context
Fewer people, more ownership, deeper shared understanding.
02
Technical honesty
We say what is working, what is not, and what still needs proof.
03
Research with deadlines
Curiosity matters, but momentum does too. We work toward decisions.
04
Long-horizon ambition
We care about foundational capabilities and the systems that will matter years from now.
We prefer precise conversations with people building durable systems, not novelty for its own sake.
Correspondence
The best conversations start with a real constraint: a system under pressure, a research question worth pursuing, or a hiring problem worth solving carefully.
Direct lines
Best for
01
Founders and operators building serious AI products
02
Investors tracking frontier intelligent systems
03
Researchers and engineers looking for unusually rigorous work