In traditional operating systems, user programs suffer from the overhead of system calls because of transitions between the user mode and the kernel mode across their protection boundary. However, this overhead can be eliminated if the user programs can be executed _safely_ inside the kernel mode. We achieve this effect by developing a safe kernel mode execution mechanism using _TAL_, Typed Assembly Language. TAL is an assembly language which ensures memory safety and control flow safety of machine code through a type system. Memory safety means that a program accesses only memory which the program is permitted to access, while control flow safety means that a program jumps to only valid code which the program is permitted to execute. These memory and control flow safeties are verified through a type checker using type annotations attached to machine code by the assembler of TAL. In our approach, user programs are written in TAL and their safety are verified through the type checker of TAL _before_ they are executed in the kernel mode. Thus, user programs can be executed in the kernel mode both safely and efficiently, because their safety is verified before execution and there is little overhead of runtime checks. Moreover, unlike other approaches to safe kernel mode execution---such as the SPIN operating system and PCC (Proof-Carrying Code)---our approach neither depends on a specific high-level programming language and its compiler, nor requires expensive calculation of complex proofs. We implemented a prototype system based on our approach by modifying the Linux Kernel. This prototype system uses original system call functions of the Linux kernel as its interface to user programs, and achieves the _same_ degree of safety (e.g., about access control of files) while eliminating the overhead of system calls only. For the purpose of performance evaluation, a TAL version of the ``find'' program, which traverses directory trees of a file system, is implemented on our prototype system and found to run 14% faster in the kernel mode than in the user mode. Also, a TAL version of the ``echod'' program, which receives data from a client and sends it back to the client, is executed and its latency is improved by 4 us.