{"id":692,"date":"2009-03-08T03:27:08","date_gmt":"2009-03-07T21:57:08","guid":{"rendered":"http:\/\/sandeepmathew.wordpress.com\/2009\/03\/08\/system-programming-game-development-and-kernel-development"},"modified":"2009-03-08T03:27:08","modified_gmt":"2009-03-07T21:57:08","slug":"system-programming-game-development-and-kernel-development","status":"publish","type":"post","link":"https:\/\/sandeepmathew.com\/index.php\/2009\/03\/08\/system-programming-game-development-and-kernel-development\/","title":{"rendered":"System Programming , Game Development and Kernel Development"},"content":{"rendered":"<h1>Practical Compiler Development Part 1<\/h1>\n<p>This is a rather informal introduction to development of  a hobby compiler . The more formal chapters on compiler development  will be given in later tutorials. By the end of this tutorial you will  be able to create a simple interpreter . This can be easily converted  into a compiler. The requirements for following this tutorial are<\/p>\n<p>1. A reasonable grasp of C \/ C++ \/ Python \/ C# \/ Java<\/p>\n<p>2. An elementary knowledge of basic data structures and recursion<\/p>\n<p><strong> <\/strong><\/p>\n<p><strong>Compiler defined<\/strong><\/p>\n<p><strong> <\/strong><\/p>\n<p>A compiler is defined as a program that converts a  source file in a specified form to an output file that can be fed to an  assembler or simulated by a machine (virtual \/ real). Most of the  compiler convert the given source file into assembly language that can  be assembled by an assembler of the target machine.<\/p>\n<p>For example to see the code generated by your source program in Turbo C++ , type the following<\/p>\n<p>tcc -S source.c<\/p>\n<p><strong> <\/strong><\/p>\n<p><strong>Phases of a compiler<\/strong><\/p>\n<p><strong> <\/strong><\/p>\n<p>Lexical analysis : The lexical analysis phase of the  compiler essentially does the tokenizing . This phase takes a stream of  text as input and converts it into tokens required for the parser . For  example the lexical analyzer splits the C source into if , else , while ,  int etc . The lexical analyzer can be constructed using a hard coded  dfa or by using a generator like lex . In the next part of this tutorial  I will describe how to write your own lex like generator . In this  tutorial we will use an ad hoc lexical analyzer which will be sufficient  for the purpose of demonstration. In fact you will be surprised to see  that this sort of approach is taken in small hobby compilers like small-  C .<\/p>\n<p>Syntax Analysis : The lexemes must be arranged in such  form that reflects the syntactic structure of the language . This is  called syntax analysis or parsing . In this tutorial we will discuss  only a form of top down parsing known as recursive descent parsing with  one look ahead . Every programming language has a grammar which is  usually represented in BNF form . A CFG grammar of a programming  language is defined by the following<\/p>\n<p>1. A set of terminal symbols that cannot be substituted<\/p>\n<p>2. A set of non terminals which can be substituted by terminal or other non terminals<\/p>\n<p>3. A set of Productions that define the grammar (of the form NT -&gt; A | B , ie LHS should have<\/p>\n<p>only one nonterminal)<\/p>\n<p>4. A start symbol from which everything begins<\/p>\n<p>Eg A grammar for the string aaaa ..any number of times is<\/p>\n<p>S -&gt; \u201ca \u201c S { terminals are enclosed in \u201c \u201c and non terminals in capital letters }<\/p>\n<p>{ | used to denote OR }<\/p>\n<p>S -&gt; nothing<\/p>\n<p>ie this means that S can be an \u201ca\u201d followed by S itself or S can be nothing .<\/p>\n<p>Lets see how aaa is derived<\/p>\n<p>S-&gt;aS =&gt; S-&gt;aaS ( using S -&gt; aS) =&gt; S-&gt;aaaS(using S-&gt;aS) =&gt; S-&gt;aaa (using S-&gt; nothing_left) .<\/p>\n<p>Now lets see how a recursive descent parser is  constructed for this grammar. This is easily done by replacing each non  terminal by function call and performing a match for the terminal<\/p>\n<p>Void parseS()<\/p>\n<p>{<\/p>\n<p>match_current_input_token_for(\u201ca\u201d);<\/p>\n<p>if( not_matched) error();<\/p>\n<p>if(end_of_input) return ;<\/p>\n<p>parseS();<\/p>\n<p>}<\/p>\n<p>In order to construct a recursive descent parser like the above the grammar should follow these<\/p>\n<p>conditions .<\/p>\n<p>For each production of the form S \u2013 &gt; A | C | E .<\/p>\n<p>The first sets of each non-terminal in each of this production must be disjoint<\/p>\n<p>ie FIRST (A) intersection FIRST (C) intersection FIRST(E) = NULL<\/p>\n<p>The first sets and the follow sets of a nonterminal must be disjoint ie \u2026<\/p>\n<p>FIRST(A) intersection FOLLOW (A) = NULL<\/p>\n<p>FIRST(C) intersection FOLLOW ( C ) = NULL<\/p>\n<p>\u2026so on \u2026<\/p>\n<p>This implies that the first set should not be equal to the follow set of the non- terminal.<\/p>\n<p>The the first set of a terminal is the symbol itself.  The first set of the non \u2013 terminal is the set of terminals with which  the nonterminal can begin with . The follow set of a non terminal is the  set of terminals which can come after the non terminal. Lets find out  the First n Follow for the following<\/p>\n<p>grammar .<\/p>\n<p>S -&gt; \u201ca\u201dA | C D<\/p>\n<p>A-&gt;\u201db\u201dA |\u201dc\u201d<\/p>\n<p>C -&gt; \u201ce\u201d|nothing<\/p>\n<p>D-&gt;\u201dd\u201d | \u201cg\u201d D<\/p>\n<p>FIRST(S) = FIRST ( \u201ca\u201dA) union FIRST(CD)<\/p>\n<p>= \u201ca\u201d union FIRST( C )<\/p>\n<p>= \u201ca\u201d union FIRST ( C ) union FIRST (D) { since C can be nothing , first of D must be<\/p>\n<p>included}<\/p>\n<p>= \u201ca\u201d union FIRST ( \u201ce\u201d ) union FIRST (\u201cd\u201d) union FIRST (\u201cg\u201d)<\/p>\n<p>= \u201ca\u201d union \u201ce\u201d union \u201cd\u201d union \u201cg\u201d<\/p>\n<p>= { a , e , d , g }<\/p>\n<p>Similarly FOLLOW(C) = FIRST(D) = { d , g } etc \u2026  Computing the first and follow of the rest of the symbols in the grammar  is left as an exercise to you . These rules have to followed in order  to avoid conflicts in the predictive parsing table. More about advanced  parsing methods in later chapters .<\/p>\n<p>Now i present a more formal algorithms for finding first  and follow .is given below . You might find it easier to follow the  informal definitions that the algorithms given below.<\/p>\n<p>Finding all nothingable non terminals<\/p>\n<p>1) if given A -&gt; nothing the set NOTINGABLE(A) = true<\/p>\n<p>2) while no changes made in NOTINGABLE table<\/p>\n<p>3) if production such that A -&gt; A1 A2 A3 \u2026 Ak and all of NOTHINGABLE(A1)<\/p>\n<p>NOTHINGABLE(A2) \u2026 NOTHINGABLE(Ak) = true<\/p>\n<p>4) then NOTINGABLE(A) = true<\/p>\n<p>Finding first<\/p>\n<p>1) if terminal then set FIRST (Symbol) = symbol<\/p>\n<p>2) while no changes made in FIRST table<\/p>\n<p>3) if production such that A -&gt; A1 A2 A3 \u2026 Ak<\/p>\n<p>4) FIRST(A) = FIRST(A) union FIRST (A1)<\/p>\n<p>5) while (NOTHINGABLE(A1 ) , NOTHINGABLE(A2) .. NOTHINGABLE(Aj). = TRUE)<\/p>\n<p>6) FIRST(A) = FIRST(A) U FIRST(A1) U FIRST (A2) U FIRST (A3) \u2026<\/p>\n<p>FIRST(Aj+1)<\/p>\n<p>Finding follow<\/p>\n<p>1) follow is not defined for terminals<\/p>\n<p>2) follow of the Start symbol is the end of input marker<\/p>\n<p>3) if there exits a nonterminal A such that B -&gt; E A D then<\/p>\n<p>4) add FIRST(D) to FOLLOW(A)<\/p>\n<p>5) if there exists a production such that A -&gt; G H such that NOTHINGABLE(H) = true then<\/p>\n<p>6) everything in FOLLOW (A) is also in FOLLOW (H)<\/p>\n<p>As a final example, grammar for if statement.<\/p>\n<p>S -&gt; IFCONDTION<\/p>\n<p>IFCONDITON -&gt; \u201cif\u201d EXPRESSION \u201cthen\u201d STATEMENT<\/p>\n<p>EXPRESSION -&gt; TERM \u201c+\u201d TERM<\/p>\n<p>TERM -&gt; FACTOR *FACTOR<\/p>\n<p>FACTOR -&gt; id<\/p>\n<p>EXPRESSION -&gt; \u201c(\u201c EXPRESSION \u201c)\u201d<\/p>\n<p>STATEMENT -&gt; id \u201c;\u201d STATEMENT | nothing<\/p>\n<p>It can be seen that right recursion in many cases is unneeded and can be removed using a simple loop .<\/p>\n<p>As right recursion leads to tail recursion. Eg<\/p>\n<p>The grammar S -&gt;aS | nothing can be more compactly written as S-&gt; (a)*<\/p>\n<p>The parser reduces to<\/p>\n<p>parseS()<\/p>\n<p>{<\/p>\n<p>while(current_token == \u201ca\u201d ) advance_input_pointer();<\/p>\n<p>}<\/p>\n<p>Left recursion will lead to an infinite loop and should be removed .For eg the grammar<\/p>\n<p>S -&gt; S B | D<\/p>\n<p>can be re written as<\/p>\n<p>S -&gt; D S_DASH<\/p>\n<p>S_DASH -&gt; B S_DASH<\/p>\n<p>Common factor in grammars should also be removed eg for the grammar<\/p>\n<p>S -&gt;\u201d if\u201d condition \u201cthen\u201d statement<\/p>\n<p>S -&gt; \u201cif\u201d condition \u201cthen\u201d statement \u201celse\u201d statement<\/p>\n<p>can be modified into<\/p>\n<p>S-&gt;AB<\/p>\n<p>A -&gt; \u201d if\u201d condition \u201cthen\u201d statement<\/p>\n<p>B -&gt; \u201celse\u201d statement | nothing<\/p>\n<p>Semantic Analysis : Syntax analysis does not tell anything about the meaning of the statement under<\/p>\n<p>consideration . It simply check whether the syntax is correct . For eg<\/p>\n<p>const int i = 20;<\/p>\n<p>i++;<\/p>\n<p>The above lines are syntactically correct but semantically wrong . The semantics are sometimes given<\/p>\n<p>using attribute grammars . .More about this later.<\/p>\n<p>Code Generation : This is the final phase where the  assembly code to the target machine is generated . I have skipped many  other phases which are part of a more mature compiler. The purpose of  this tutorial is to give quick introduction . The most of the hobby  compilers the major component is the parser and code generation and  semantic analysis is done when each non terminal is encountered in the  parser.<\/p>\n<p>As an example of developing a simple interpreter , I  have provided the source code of a simple basic like interpreter. I have  used the mingw port of gcc for compiling the source . I used dev-cpp as  the ide .<\/p>\n<p>The corresponding project and source files are also submitted .<br \/>\nDownload the source code in pdf format.<a href=\"http:\/\/sandeepmathew.wordpress.com\/wp-content\/uploads\/2009\/03\/source.pdf\">clicky<\/a> , Basically , what you have to do is copy paste the code and compile it with a modern C++ compiler .<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Practical Compiler Development Part 1 This is a rather informal introduction to development of a hobby compiler . The more formal chapters on compiler development will be given in later tutorials. By the end of this tutorial you will be able to create a simple interpreter . This can be easily converted into a compiler. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[9,10,16],"_links":{"self":[{"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/posts\/692"}],"collection":[{"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/comments?post=692"}],"version-history":[{"count":0,"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/posts\/692\/revisions"}],"wp:attachment":[{"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/media?parent=692"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/categories?post=692"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sandeepmathew.com\/index.php\/wp-json\/wp\/v2\/tags?post=692"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}