root/docs/03Evaluator.pod

Revision 7806, 10.7 kB (checked in by autrijus, 3 years ago)

* Wendy's notes of what can hopefully become PA02 that is accessible to nonjargon audiences

  • Property svn:mime-type set to text/plain; charset=UTF-8
  • Property svn:eol-style set to native
Line 
1=pod
2
3=head1 NAME
4
5Pugs Apocryphon 2 - Overview of Pugs Internals
6
7=head1 DATE
8
9This document could get out of date very quickly. If it seems that more than
10a week has passed between the time there was an update to the time you read
11these words, prod someone on C<#perl6>, or read
12L<http://use.perl.org/~autrijus/journal> and see if there's been any big
13change.
14
15The current copy was last revised on 2005-06-03.
16
17=head1 PRELUDE
18
19Pugs is written in the Haskell language. Before dabbling with Pugs internals
20it may be wise to study a bit of Haskell.
21
22=head1 INTRODUCTION
23
24Pugs is a versatile project, tapping into the power of many other projects.
25Pugs itself fits into a star topology, optionally using these projects to
26gain more features.
27
28Each will be discussed later in more detail.
29
30=head1 PUGS RUNTIME OVERVIEW
31
32Pugs's own core is also componentized. The separation roughly coincides with
33Perl 6's runtime and compile time. However, it is notable that the parts are
34intermixed, since Perl 6 is a very dynamic language.
35
36The two parts are roughly:
37
38=head2 The Parser
39
40This part takes a string of Perl 6 and converts it into the AST, or Abstract
41Syntax Tree. The AST represents the program's structure, which the evaluator
42later executes.
43
44This part is responsible for the compilation of Perl 6 source code.
45
46=head2 The Evaluator
47
48The evaluator combines the program's AST with what is known as the I<Env>,
49or roughly speaking, the current state of the program's execution.
50
51It walks the nodes of the tree, reducing them into values. Of course, the
52interesting part is what happens during the reduction - this is the actual
53execution of the code.
54
55This part is reponsible for the runtime - the execution of compiled Perl 6
56ASTs.
57
58=head1 SOURCE TREE OVERVIEW
59
60This section does not discuss the files in detail. Pugs is documented with
61Haddock, and for reference that is the place to look.
62
63What this section B<does> provide is an overview of the responsibilities each
64part has in overall structure of Pugs.
65
66=head2 F<src/Pugs/AST.hs>
67
68This file contains the definitions of the AST's types.
69
70It is more or less a description of how Perl 6 programs can look after
71compilation.
72
73=head2 F<src/Pugs/Parser.hs>
74
75This file contains the parser for Perl 6 code. It is written using the Parsec
76library.
77
78It produces Syn and Exp structures as defined in F<AST.hs>, and puts them in
79the envBody of the env.
80
81=head2 F<src/Pugs/Eval.hs>
82
83This file implements the evaluation logic for the AST. Its main job is
84reducing Exps into Vals. Most Exps require applying VCode objects, which
85represent closures (blocks, subroutines, operators...), looking up
86variables, or other basic features Perl 6 provides, and this is where most
87of this is coded.
88
89=head2 F<src/Pugs/Prim.hs>
90
91This file contains the implementations of many of the primitive operators.
92For example, the addition operator, C<< &infix:<+> >> is defined here. It
93converts the two Perl values it gets into Haskell Nums, applies Haskell's
94builtin addition operator to these, and then makes a Perl value out of the
95result.
96
97The various operators and builtin functions are implemented using the opN
98function, and the definition of their Perlish behavior is defined in the
99table at the bottom.
100
101The table basically says whether the builtin is infix or not, how many
102parameters it accepts, and so forth.
103
104=head2 F<src/Pugs/Run.hs>
105
106This is the file that ties it all together, it takes a Perl 6 file, slurps
107the string out of it, hands it to the parser, then takes the AST out and
108sends its envBody into the evaluator.
109
110=head1 A PROGRAM'S LIFE CYCLE IN DETAIL
111
112Earlier we discussed how eventually what the parser emits is fed to the
113evaluator. Now we'll look at the details and special cases more closely.
114
115As we've seen before, the runtime calls the parser on the Perl code, and it,
116in turn, generates an AST. Most parsed things result in trivial structures --
117just a representation of the program in something a bit more manipulable
118than a string of source code.
119
120This basic structure, a node of the AST, is called an C<Exp> - an expression.
121It represents the combination of values and operation, and the evaluator
122knows to boil it down into a C<Val>.
123
124Matters get a little more complex when the code not only I<is> something at
125compile time, but actually I<does> something, like declarations of variables
126which create the variables, or C<BEGIN> blocks which execute code at compile
127time.
128
129=head2 Enter unsafe unsafeEvalExp
130
131The parser is pure in that it does not affect the outside world when it does
132its thing. It constructs the AST, but not much more.
133
134In order to execute things like C<BEGIN> blocks there are exceptions to
135this.
136
137    BEGIN { print "compile time" };
138
139This operation has side effects - it causes the world outside the pugs
140interpreter to change. However, it must happen within the "pure" parser, and
141Haskell does not normally allow these things.
142
143The C<unsafe> in the name denotes that an effort was made to not care about
144that bit of safety, and do IO in the pure parser anyway.
145
146But it does not strictly mean IO - what C<unsafeEvalExp> is just a short
147circuit from the parser to the evaluator, allowing code to run at compile
148time.
149
150C<BEGIN> blocks are evaluated by calling C<unsafeEvalExp> on the resulting
151C<Exp> immediately after the block finished parsing, and then replacing that
152point in the syntax tree with a the value the block was reduced to.
153
154Declarations of sorts create a node in the syntax tree called a C<Syn>.
155C<Syn>s represents syntactic constructs of sorts, amongst which are variable
156declarations. When evaluated, variable declarations create a type of C<Exp>
157that will modify the C<Env>, adding a new symbol, and then roll back the
158change later. They are also evaluated immediately using C<unsafeEvalExp>.
159
160Other C<Syn>s include control flow structure, and various keywords, but they
161will be discussed later.
162
163=head2 reduce :: Exp -> Eval Val
164
165The heading of this section is the type declaration for the evaluator's
166C<reduce> function.
167
168Let's break it down.
169
170The C<Exp> means that the single argument C<reduce> accepts is an expression.
171The C<Eval> is the monadic fudgeting of the C<Val> type, indicating that the
172reduction process of the C<Val> from the C<Exp> is controlled by the C<Eval>
173monad.
174
175Lets try to explain this with an example:
176
177    reduce (Val v) = reduceVal v
178    reduceVal v = retVal v
179
180This form of reduce takes the expression that is just a value, like C<3> or
181C<"foo"> and encapsulates the data into the C<Eval> monad using C<retVal>.
182
183    reduce (Var name) = reduceVar name
184    reduceVar name = do
185        v <- findVar name
186        case v of
187            Just var -> evalRef var
188            _ | (':':rest) <- name -> return $ VType (mkType rest)
189            _ -> retError "Undeclared variable" name
190
191This reduction takes an expression like C<$a> and reduces it into a value. Here
192the C<Eval> monad's purpose comes into play a bit more clearly.
193
194The first line finds the container named by C<name> using the C<findVar>
195function.
196
197The C<Eval> monad is in use because such an operation might fail - in this
198case, the variable does not exist. The second line throws an exception when
199that happens.
200
201Lastly, if everything went OK, the container is dereferenced to return the
202value it contains. Here is the type signature of C<evalRef>, for reference:
203
204    evalRef :: VRef -> Eval Val
205
206=head2 Many different reductions
207
208F<src/Pugs/Eval.hs> contains 11 different variants of reduce, which dispatch
209to over 60 sub-variations. Each one serves a different purpose, and most are
210pretty straightforward.
211
212For example C<reduce (Syn "env" [])> is the reduction that takes care of
213variable declaration using C<VControl>, while C<reduce (Cxt cxt exp)> forces
214the subexpression C<exp> to be evaluated in the context C<cxt>.
215
216Let's look at some of the more interesting C<reduce>s. My personal favourite is
217C<for>.
218
219It is defined in the C<reduceSyn name exps> declaration, which is the
220reduction of the various syntatic constructs. It uses Haskell's pattern
221matching to invoke the appropriate reduction variant for the values of C<name>.
222Here's the case C<for>, annotated:
223
224    reduceSyn "for" [list, body] = do
225
226This takes the two expressions to the C<for> syntax thingy, the list part, and
227the body. C<for (@list) { print "i'm the body" }>.
228
229The body is actually a subroutine; we'll look at that in a bit. After that
230line are some details which we don't care about right now. Let's pretend they
231don't exist and jump down to
232
233    let arity = max 1 $ length (subParams sub)
234
235That part determines how many elements to take out of C<list> for each
236iteration of C<body>. After that comes a lexically scoped function
237definition, C<runBody>. Let's analyze it.
238
239    runBody [] _ = retVal undef
240
241This takes care the case where there are no more elements in C<list>.
242Contrast it with:
243
244    runBody vs sub' = do
245        let (these, rest) = arity `splitAt` vs
246        genSymCC "&next" $ \symNext -> do
247            genSymPrim "&redo" (const $ runBody vs sub') $ \symRedo -> do
248                apply (updateSubPad sub' (symRedo . symNext)) Nothing $
249                    map (Val . VRef . MkRef) these
250        runBody rest sub'
251
252Which matches C<vs> that isn't an empty list (the first C<runBody> matched
253that case). The C<splitAt> takes C<arity> elements out of C<vs>, that is the
254number of parameters the body subroutine wants, and puts them in C<these>. The
255rest go into C<rest>.
256
257The lines after that set the C<&redo> and C<&next> variables so that they
258contain functions which will control the flow of the current iteration of the
259loop.
260
261The line starting with C<apply> applies the subroutine currently in C<sub'>,
262and gives it C<these> as its parameters on the line starting with C<map>.
263
264Lastly, after the subroutine is applied, C<runBody> is run again on C<rest>.
265
266    genSymCC "&last" $ \symLast -> do
267        let munge sub | subParams sub == [defaultArrayParam] =
268                munge sub{ subParams = [defaultScalarParam] }
269            munge sub = updateSubPad sub symLast
270
271Outside of C<runBody>'s definition, the C<&last> variable is also defined. It
272controls the whole loop, not only a single step, so it doesn't need to be in
273C<&runBody>.
274
275When all the auxiliary functions have been defined, we can run the body with
276the list passed into the for (munging into C<elms> omitted):
277
278    runBody elms $ munge sub
279
280=head2 VCode application
281
282Now that we've seen a nice example of how a subroutine (which might be
283masquerading as a simple block) is used, lets see how C<VCode>, the value
284representing closures (subroutines, blocks, coroutines, etc) is called.
285
286
287Subroutine application can be very simple, in the case of a C<Prim>. At other
288times it involves entering a lexical scope, due to block open. Sometimes
289parameter binding is involved too.
290
291But have no fear, we will soon see that like most parts of pugs, these
292things are actually pretty simple.
293
294=cut
Note: See TracBrowser for help on using the browser.