This manual describes the programming language Cool: the lassroom bject-riented anguage. Cool is a small language that can be implemented with reasonable effort in a one semester course. Still, Cool retains many of the features of modern programming languages including objects, static typing, and automatic memory management.
Cool programs are sets of . A class encapsulates the variables and procedures of a data type. Instances of a class are . In Cool, classes and types are identified. That is, every class defines a type. Classes permit programmers to define new types and associated procedures (or ) specific to those types. Inheritance allows new types to extend the behavior of existing types.
Cool is an language. Most Cool constructs are expressions, and every expression has a value and a type. Cool is : procedures are guaranteed to be applied to data of the correct type. While static typing imposes a strong discipline on programming in Cool, it guarantees that no runtime type errors can arise in the execution of Cool programs.
This manual is divided into informal and formal components. For a short, informal overview, the first half (through Section 9) suffices. The formal description begins with Section 10.
Cool source files have extension . The programming projects will also define other file formats related to Cool but they are not officially part of the language specification.
You can obtain the Cool interpreter from the course website. To interpret (i.e., run) a Cool program:
cool file.cl
This official version is often called the and you are encouraged to use it as a point of comparison when you are designing and testing parts of the course project. The reference Cool interpreter has been specifically structured so that you can run the various stages (i.e., lexing, parsing, type-checking and interpreting) independently. This power is useful for PA2 through PA5.
The Cool interpreter has an number of command-line options:
You may encounter other University uses of Cool on the web that mention programs such as and . Those tool are used for a course on compilers; this is a course on interpreters. We will not use or . In addition, we use a slightly different version of the Cool language specification, so comparing results against external tools may not be helpful.
All code in Cool is organized into classes. Each class definition must be contained in a single source file, but multiple classes may be defined in the same file. Class definitions have the form:
class <type> [ inherits <type> ] { <feature_list> };
The notation [ ... ] denotes an optional construct. All class names are globally visible. Class names begin with an uppercase letter. Classes may not be redefined.
The body of a class definition consists of a list of feature definitions. A feature is either an or a . An attribute of class specifies a variable that is part of the state of objects of class . A method of class is a procedure that may manipulate the variables and objects of class .
One of the major themes of modern programming languages is , which is the idea that certain aspects of a data type's implementation should be abstract and hidden from users of the data type. Cool supports information hiding through a simple mechanism: all attributes have scope local to the class, and all methods have global scope. Thus, the only way to provide access to object state in Cool is through methods.
Feature names must begin with a lowercase letter. No method name may be defined multiple times in a class, and no attribute name may be defined multiple times in a class, but a method and an attribute may have the same name.
A fragment from illustrates simple cases of both attributes and methods:
class Cons inherits List { xcar : Int; xcdr : List; isNil() : Bool { false }; init(hd : Int, tl : List) : Cons { { xcar <- hd; xcdr <- tl; self; } }; ... };
In this example, the class has two attributes and and two methods and . Note that the types of attributes, as well as the types of formal parameters and return types of methods, are explicitly declared by the programmer.
Given object of class and object of class , we can set the and fields by using the method :
c.init(1,l)
This notation is . There may be many definitions of methods in many different classes. The dispatch looks up the class of the object to decide which method to invoke. Because the class of is , the method in the class is invoked. Within the invocation, the variables and refer to 's attributes. The special variable refers to the object on which the method was dispatched, which, in the example, is itself.
There is a special form that generates a fresh object of class . An object can be thought of as a record that has a slot for each of the attributes of the class as well as pointers to the methods of the class. A typical dispatch for the method is:
(new Cons).init(1,new Nil)
This example creates a new cons cell and initializes the "car" of the cons cell to be 1 and the "cdr" to be .(2) There is no mechanism in Cool for programmers to deallocate objects. Cool has ; objects that cannot be used by the program are deallocated by a runtime garbage collector.
Attributes are discussed further in Section 5 and methods are discussed further in Section 6.
If a class definition has the form
class C inherits P { ... };
then class inherits the features of . In this case P is the class of and is a class of . The semantics of is that has all of the features defined in in addition to its own features. In the case that a parent and child both define the same method name, then the definition given in the child class takes precedence. It is illegal to redefine attribute names. Furthermore, for type safety, it is necessary to place some restrictions on how methods may be redefined (see Section 6).
There is a distinguished class . If a class definition does not specify a parent class, then the class inherits from by default. A class may inherit only from a single class; this is aptly called "single inheritance."(3) The parent-child relation on classes defines a graph. This graph may not contain cycles. For example, if inherits from , then must not inherit from . Furthermore, if inherits from , then must have a class definition somewhere in the program. Because Cool has single inheritance, it follows that if both of these restrictions are satisfied, then the inheritance graph forms a tree with as the root.
In addition to , Cool has four other : , , , and . The basic classes are discussed in Section 8.
In Cool, every class name is also a type. In addition, there is a type that can be used in special circumstances.
A has the form , where is a variable and is a type. Every variable must have a type declaration at the point it is introduced, whether that is in a , , or as the formal parameter of a method. The types of all attributes must also be declared.
The basic type rule in Cool is that if a method or variable expects a value of type , then any value of type may be used instead, provided that is an ancestor of in the class hierarchy. In other words, if inherits from , either directly or indirectly, then a can be used wherever a would suffice.
When an object of class may be used in place of an object of class , we say that to or that (think: is lower down in the inheritance tree). As discussed above, conformance is defined in terms of the inheritance graph.
(Conformance) Let and be types.
Because is the root of the class hierarchy, it follows that for all types .
The type is used to refer to the type of the variable. This is useful in classes that will be inherited by other classes, because it allows the programmer to avoid specifying a fixed final type at the time the class is written. For example, the program
class Silly { copy() : SELF_TYPE { self }; }; class Sally inherits Silly { }; class Main { x : Sally <- (new Sally).copy(); main() : Sally { x }; };
Because is used in the definition of the method, we know that the result of is the same as the type of the parameter. Thus, it follows that has type , which conforms to the declaration of attribute .
Note that the meaning of is not fixed, but depends on the class in which it is used. In general, may refer to the class in which it appears, or any class that conforms to . When it is useful to make explicit what may refer to, we use the name of the class in which appears as an index . This subscript notation is not part of Cool syntax--it is used merely to make clear in what class a particular occurrence of appears.
From Definition 4.1, it follows that . There is also a special conformance rule for :
Finally, may be used in the following places: , as the return type of a method, as the declared type of a variable, or as the declared type of an attribute. No other uses of are permitted.
The Cool type system guarantees at compile time that execution of a program cannot result in runtime type errors. Using the type declarations for identifiers supplied by the programmer, the type checker infers a type for every expression in the program.
It is important to distinguish between the type assigned by the type checker to an expression at compile time, which we shall call the type of the expression, and the type(s) to which the expression may evaluate during execution, which we shall call the types.
The distinction between static and dynamic types is needed because the type checker cannot, at compile time, have perfect information about what values will be computed at runtime. Thus, in general, the static and dynamic types may be different. What we require, however, is that the type checker's static types be with respect to the dynamic types.
For any expression , let be a dynamic type of and let be the static type inferred by the type checker. Then the type checker is if for all expressions it is the case that .
Put another way, we require that the type checker err on the side of overestimating the type of an expression in those cases where perfect accuracy is not possible. Such a type checker will never accept a program that contains type errors. However, the price paid is that the type checker will reject some programs that would actually execute without runtime errors.
An attribute definition has the form
<id> : <type> [ <- <expr> ];
The expression is optional initialization that is executed when a new object is created. The static type of the expression must conform to the declared type of the attribute. If no initialization is supplied, then the default initialization is used (see below).
When a new object of a class is created, all of the inherited and local attributes must be initialized. Inherited attributes are initialized first in inheritance order beginning with the attributes of the greatest ancestor class. Within a given class, attributes are initialized in the order they appear in the source text.
Attributes are local to the class in which they are defined or inherited. Inherited attributes cannot be redefined.
All variables in Cool are initialized to contain values of the appropriate type. The special value is a member of all types and is used as the default initialization for variables where no initialization is supplied by the user. ( is used where one would use in C or in Java; Cool does not have anything equivalent to C's or Java's type.) Note that there is no name for in Cool; the only way to create a value is to declare a variable of some class other than , , or and allow the default initialization to occur, or to store the result of a loop.
There is a special form that tests whether a value is (see Section 7.11). In addition, values may be tested for equality. A value may be passed as an argument, assigned to a variable, or otherwise used in any context where any value is legitimate, except that a dispatch to or case on generates a runtime error.
Variables of the basic classes , , and are initialized specially; see Section 8.
A method definition has the form
<id>(<id> : <type>,...,<id> : <type>): <type> { <expr> };
There may be zero or more formal parameters. The identifiers used in the formal parameter list must be distinct. The type of the method body must conform to the declared return type. When a method is invoked, the formal parameters are bound to the actual arguments and the expression is evaluated; the resulting value is the meaning of the method invocation. A formal parameter hides any definition of an attribute of the same name.
To ensure type safety, there are restrictions on the redefinition of inherited methods. The rule is simple: If a class inherits a method from an ancestor class , then may override the inherited definition of provided the number of arguments, the types of the formal parameters, and the return type are exactly the same in both definitions.
To see why some restriction is necessary on the redefinition of inherited methods, consider the following example:
class P { f(): Int { 1 }; }; class C inherits P { f(): String { "1" }; };
Let be an object with dynamic type . Then
p.f() + 1
is a well-formed expression with value 2. However, we cannot substitute a value of type for , as it would result in adding a string to a number. Thus, if methods can be redefined arbitrarily, then subclasses may not simply extend the behavior of their parents, and much of the usefulness of inheritance, as well as type safety, is lost.
Expressions are the largest syntactic category in Cool.
The simplest expressions are constants. The boolean constants are and . Integer constants are unsigned strings of digits such as , , and . String constants are sequences of characters enclosed in double quotes, such as String constants may be at most 1024 characters long. There are other restrictions on strings; see Section 10.
The constants belong to the basic classes , , and . The value of a constant is an object of the appropriate basic class.
The names of local variables, formal parameters of methods, , and class attributes are all expressions. The identifier may be referenced, but it is an error to assign to or to bind in a , a , or as a formal parameter. It is also illegal to have attributes named .
Local variables and formal parameters have lexical scope. Attributes are visible throughout a class in which they are declared or inherited, although they may be hidden by local declarations within expressions. The binding of an identifier reference is the innermost scope that contains a declaration for that identifier, or to the attribute of the same name if there is no other declaration. The exception to this rule is the identifier , which is implicitly bound in every class.
An assignment has the form
<id> <- <expr>
The static type of the expression must conform to the declared type of the identifier. The value is the value of the expression. The static type of an assignment is the static type of .
There are three forms of dispatch (i.e. method call) in Cool. The three forms differ only in how the called method is selected. The most commonly used form of dispatch is
<expr>.<id>(<expr>,...,<expr>)
Consider the dispatch . To evaluate this expression, the arguments are evaluated in left-to-right order, from to . Next, is evaluated and its class noted (if is a runtime error is generated). Finally, the method in class is invoked, with the value of bound to in the body of and the actual arguments bound to the formals as usual. The value of the expression is the value returned by the method invocation.
Type checking a dispatch involves several steps. Assume has static type A. (Recall that this type is not necessarily the same as the type above. is the type inferred by the type checker; is the class of the object computed at runtime, which is potentially any subclass of .) Class must have a method , the dispatch and the definition of must have the same number of arguments, and the static type of the th actual parameter must conform to the declared type of the th formal parameter.
If has return type and is a class name, then the static type of the dispatch is . Otherwise, if has return type , then the static type of the dispatch is . To see why this is sound, note that the parameter of the method conforms to type . Therefore, because returns , we can infer that the result must also conform to . Inferring accurate static types for dispatch expressions is what justifies including in the Cool type system.
The other forms of dispatch are:
<id>(<expr>,...,<expr>) <expr>@<type>.id(<expr>,...,<expr>)
The first form is shorthand for .
The second form provides a way of accessing methods of parent classes that have been hidden by redefinitions in child classes. Instead of using the class of the leftmost expression to determine the method, the method of the class explicitly specified is used. For example, invokes the method in class on the object that is the value of . For this form of dispatch, the static type to the left of "@" must conform to the type specified to the right of "@".
A conditional has the form
if <expr> then <expr> else <expr> fi
The semantics of conditionals is standard. The predicate is evaluated first. If the predicate is , then the branch is evaluated. If the predicate is , then the branch is evaluated. The value of the conditional is the value of the evaluated branch.
The predicate must have static type . The branches may have any static types. To specify the static type of the conditional, we define an operation (pronounced "join") on types as follows. Let be any types other than . The of a set of types means the least element with respect to the conformance relation .
Let and be the static types of the branches of the conditional. Then the static type of the conditional is . (think: Walk towards from each of and until the paths meet.)
A loop has the form
while <expr> loop <expr> pool
The predicate is evaluated before each iteration of the loop. If the predicate is , the loop terminates and is returned. If the predicate is , the body of the loop is evaluated and the process repeats.
The predicate must have static type . The body may have any static type. The static type of a loop expression is .
A block has the form
{ <expr>; ... <expr>; }
The expressions are evaluated in left-to-right order. Every block has at least one expression; the value of a block is the value of the last expression. The expressions of a block may have any static types. The static type of a block is the static type of the last expression.
An occasional source of confusion in Cool is the use of semi-colons (";"). Semi-colons are used as terminators in lists of expressions (e.g., the block syntax above) and not as expression separators. Semi-colons also terminate other Cool constructs, see Section 11 for details.
A let expression has the form
let <id1> : <type1> [ <- <expr1> ], ..., <idn> : <typen> [ <- <exprn> ] in <expr>
The optional expressions are ; the other expression is the . A is evaluated as follows. First is evaluated and the result bound to . Then is evaluated and the result bound to , and so on, until all of the variables in the are initialized. (If the initialization of is omitted, the default initialization of type is used.) Next the body of the is evaluated. The value of the is the value of the body.
The identifiers are visible in the body of the . Furthermore, identifiers are visible in the initialization of for any .
If an identifier is defined multiple times in a , later bindings hide earlier ones. Identifiers introduced by also hide any definitions for the same names in containing scopes. Every expression must introduce at least one identifier.
The type of an initialization expression must conform to the declared type of the identifier. The type of is the type of the body.
The of a extends as far (encompasses as many tokens) as the grammar allows.
A case expression has the form
case <expr0> of <id1> : <type1> => <expr1>; ... <idn> : <typen> => <exprn>; esac
Case expressions provide runtime type tests on objects. First, is evaluated and its dynamic type noted (if evaluates to a run-time error is produced). Next, from among the branches the branch with the least type such that is selected. The identifier is bound to the value of and the expression is evaluated. The result of the is the value of . If no branch can be selected for evaluation, a run-time error is generated. Every expression must have at least one branch.
For each branch, let be the static type of . The static type of a expression is . The identifier introduced by a branch of a hides any variable or attribute definition for visible in the containing scope.
The expression has no special construct for a "default" or "otherwise" branch. The same affect is achieved by including a branch
x : Object => ...
because every type is to .
The expression provides programmers a way to insert explicit runtime type checks in situations where static types inferred by the type checker are too conservative. A typical situation is that a programmer writes an expression and type checking infers that has static type . However, the programmer may know that, in fact, the dynamic type of is always for some . This information can be captured using a case expression:
case e of x : C => ...
In the branch the variable is bound to the value of but has the more specific static type .
A expression has the form
new <type>
The value is a fresh object of the appropriate class. If the type is , then the value is a fresh object of the class of in the current scope. The static type is .
The expression
isvoid expr
evaluates to if is and evaluates to if is not .
Cool has four binary arithmetic operations: . The syntax is
expr1 <op> expr2
To evaluate such an expression first is evaluated and then . The result of the operation is the result of the expression.
The static types of the two sub-expressions must be . The static type of the entire arithmetic expression is also . Cool has only integer division.
Cool has three comparison operations: . These comparisons may be applied to subexpressions of any types, subject to the following rules:
In all cases, the result of the comparison is a . See the type checking rules for more information.
In principle, there is nothing wrong with permitting equality tests between, for example, and . However, such a test must always be false and almost certainly indicates some sort of programming error. The Cool type checking rules catch such errors at compile-time instead of waiting until runtime.
On non-basic objects, equality is decided via pointer equality (i.e., whether the memory addresses of the objects are the same). Equality is defined for : two values are equal and a value is never equal to a non- value. See the operational semantics rules for more informaiton.
Finally, there is one unary arithmetic and one unary logical operator.
The class is the root of the inheritance graph. Even the other basic classes (e.g., and ) inherit from (and thus inherit the three methods listed below). It is an error to redefine . Methods with the following declarations are defined:
abort() : Object type_name() : String copy() : SELF_TYPE
The method flushes all output and then halts program execution with the error message .
The method returns a string with the name of the (run-time, dynamic) class of the object.
The method produces a copy of the object.(4)
The class provides the following methods for performing simple input and output operations:
out_string(x : String) : SELF_TYPE out_int(x : Int) : SELF_TYPE in_string() : String in_int() : Int
The methods and print their argument, flush the standard output, and return their parameter.
The interpreter or compiler changes every to a tab and every to a newline in the argument to before emitting the resulting string. Note that this is different from normal escape sequence handling, where would be a single character stored in the string. In Cool, it is two characters, but prints a newline instead of .
The method reads a string from the standard input, up to but not including a newline character or the end of file. The newline character is consumed but is not made part of the returned string. If an error occurs then returns , the string of length 0. Note that while literal lexical string constants are limited to size 1024, strings generated by (or , etc.) can be of arbitrary size. There is no special processing of the two-character sequences or (or, indeed ) during . Errors include:
The method reads a single possibly-signed integer, which may be preceded by whitespace. Any characters following the integer, up to and including the next newline, are discarded by . If an error occurs then returns 0. Errors include:
A class can make use of the methods in the class by inheriting from . It is an error to redefine the class.
The class provides integers. There are no methods special to . The default initialization for variables of type is 0 (not ). It is an error to inherit from or redefine .
The class provides strings. The following methods are defined:
length() : Int concat(s : String) : String substr(i : Int, l : Int) : String
The method returns the length of the parameter. The method returns the string formed by concatenating after . The method returns the substring of its parameter beginning at position with length ; string positions are numbered beginning at 0. A runtime error is generated if the specified substring is out of range. Substring errors are always reported as taking place on line 0.
The default initialization for variables of type is (not ). It is an error to inherit from or redefine .
The class provides and . The default initialization for variables of type is (not ). It is an error to inherit from or redefine .
Every program must have a class . Furthermore, the class must have a method that takes no formal parameters. The method may be defined in class or it may be inherited from another class. A program is executed by evaluating .
The remaining sections of this manual provide a more formal definition of Cool. There are four sections covering lexical structure (Section 10), grammar (Section 11), type rules (Section 12), and operational semantics (Section 13).
The lexical units of Cool are integers, type identifiers, object identifiers, special notation, strings, keywords, and white space.
Integers are non-empty strings of digits 0-9. It is a lexer error if a literal integer constant is too big to be represented as a 32-bit signed integer. 32-bit signed integers range from -2,147,483,648 to +2,147,483,647. Cool integer constants are always non-negative, so valid integer constants range from 0 to 2,147,483,647.
Identifiers are strings (other than keywords) consisting of letters, digits, and the underscore character. Type identifiers begin with a capital letter; object identifiers begin with a lower case letter. Identifiers case sensitive.
and are treated specially by Cool but are not treated as keywords. should be reported by the lexer as an identifier and should be reported by the lexer as a type. Both case sensitive.
The special syntactic symbols (e.g., parentheses, assignment operator, etc.) are given in Figure 1.
Strings are enclosed in double quotes . Within a string, a sequence '\c' denotes the two characters '\' and 'c', with the exception of the following:
\t tab \n newline
The two-character sequences and are called . Other escape sequences like (carriage return) are not part of Cool. These two special escape sequences should not be interpreted or transformed by the lexer; they are handled by the module and the run-time system.
A newline character may not appear in a string:
"This is not OK"
A string may contain embedded double quotes, so long as they are escaped. The following is a valid Cool string:
"David St. Hubbins said, \"It's such a fine line between stupid, and clever.\""
Note that Cool's interpretation of may not be what you are expecting. The two-character sequence (which is not an escape sequence) does not become in any sense. Instead, it stays . This is different from most other languages, but simplifies lexing and interpreting. Example:
class Main inherits IO { main() : Object { out_string("She said, \"Hello.\"\n") } ; } ;
A string may not contain EOF; strings cannot cross file boundaries. A string may not contain NUL, the character with ASCII value 0. The lexer must reject source text that contains malformed strings.
A string may contain the two-character sequence (backslash zero). However, that sequence does not have any special meaning -- it just yields a backslash followed by a zero inside the string.
The single character with converted integer value zero (the NUL) is not allowed. Any other character may be included in a string.
There are two forms of comments in Cool. Any characters between two dashes and the next newline (or EOF, if there is no next newline) are treated as comments. Comments may also be written by enclosing text in . The latter form of comment may be nested but may not contain EOF.
Comments cannot cross file boundaries.
The keywords of cool are: , , , , , , , , , , , , , , , , , , . Except for the constants and , keywords are case insensitive.
To conform to the rules for other objects, the first letter of and must be lowercase; the trailing letters may be upper or lower case. Thus is not a keyword but a type identifier.
White space consists of any sequence of the characters: blank (ascii 32), (newline, ascii 10), (form feed, ascii 12), (carriage return, ascii 13), (tab, ascii 9), (vertical tab, ascii 11).
Figure 1 provides a specification of Cool syntax. The specification is not in pure Backus-Naur Form (BNF); for convenience, we also use some regular expression notation. Specifically, means zero or more 's in succession; means one or more 's. Items in square brackets are optional. Double brackets are not part of Cool; they are used in the grammar as a meta-symbol to show association of grammar symbols (e.g. means followed by one or more pairs).
The precedence of infix binary and prefix unary operations, from highest to lowest, is given by the following table:
. @ ~ isvoid * / + - <= < = not <-
All binary operations are left-associative, with the exception of assignment, which is right-associative, and the three comparison operations, which do not associate.
This section formally defines the type rules of Cool. The type rules define the type of every Cool expression in a given context. The context is the , which describes the type of every unbound identifier appearing in an expression. The type environment is described in Section 12.1. Section 12.2 gives the type rules.
To a first approximation, type checking in Cool can be thought of as a bottom-up algorithm: the type of an expression is computed from the (previously computed) types of 's subexpressions. For example, an integer has type ; there are no subexpressions in this case. As another example, if has type , then the expression has type .
A complication arises in the case of an expression , where is an object identifier. It is not possible to say what the type of is in a strictly bottom-up algorithm; we need to know the type declared for in the larger expression. Such a declaration must exist for every object identifier in valid Cool programs.
To capture information about the types of identifiers, we use a . The environment consists of three parts: a method environment , an object environment , and the name of the current class in which the expression appears. The method environment and object environment are both functions (also called ). The object environment is a function of the form
which assigns the type to object identifier . The method environment is more complex; it is a function of the form
where is a class name (a type), is a method name, and are types. The tuple of types is the of the method. The interpretation of signatures is that in class the method has formal parameters of types ---in that order---and a return type .
Two mappings are required instead of one because object names and method names do not clash---i.e., there may be a method and an object identifier of the same name.
The third component of the type environment is the name of the current class, which is needed for type rules involving .
Every expression is type checked in a type environment; the subexpressions of may be type checked in the same environment or, if introduces a new object identifier, in a modified environment. For example, consider the expression
let c : Int <- 33 in ...
The expression introduces a new variable with type . Let be the object component of the type environment for the . Then the body of the is type checked in the object type environment
where the notation is defined as follows:
The general form a type checking rule is:
The rule should be read: In the type environment for objects , methods , and containing class , the expression has type . The dots above the horizontal bar stand for other statements about the types of sub-expressions of . These other statements are hypotheses of the rule; if the hypotheses are satisfied, then the statement below the bar is true. In the conclusion, the "turnstile" ("") separates context () from statement ().
The rule for object identifiers is simply that if the environment assigns an identifier type , then has type .
The rule for assignment to a variable is more complex:
Note that this type rule--as well as others--use the conformance relation (see Section 3.2). The rule says that the assigned expression must have a type that conforms to the type of the identifier in the type environment. The type of the whole expression is . The type rules for constants are all easy:
There are two cases for new, one for new SELF_TYPE and one for any other form:
Dispatch expressions are the most complex to type check.
To type check a dispatch, each of the subexpressions must first be type checked. The type of determines which declaration of the method is used. The argument types of the dispatch must conform to the declared argument types. Note that the type of the result of the dispatch is either the declared return type or in the case that the declared return type is . The only difference in type checking a static dispatch is that the class of the method is given in the dispatch, and the type must conform to .
The type checking rules for and expressions are straightforward. See Section 7.5 for the definition of the operation.
The rule has some interesting aspects.
First, the initialization is type checked in an environment without a new definition for . Thus, the variable cannot be used in unless it already has a definition in an outer scope. Second, the body is type checked in the environment extended with the typing . Third, note that the type of may be .
The rule for with no initialization simply omits the conformance requirement. We give type rules only for a with a single variable. Typing a multiple
is defined to be the same as typing
Each branch of a is type checked in an environment where variable has type . The type of the entire is the join of the types of its branches. The variables declared on each branch of a must all have distinct types.
The predicate of a loop must have type ; the type of the entire loop is always . An test has type :
With the exception of the rule for equality, the type checking rules for the primitive logical and arithmetic operations are easy.
The wrinkle in the rule for equality is that any types may be freely compared except , and , which may only be compared with objects of the same type. The cases for and are similar to the rule for equality.
The final cases are type checking rules for attributes and methods. For a class , let the object environment give the types of all attributes of (including any inherited attributes). More formally, if is an attribute (inherited or not) of , and the declaration of is , then
The method environment is global to the entire program and defines for every class the signatures of all of the methods of (including any inherited methods).
The two rules for type checking attribute defininitions are similar the rules for . The essential difference is that attributes are visible within their initialization expressions. Note that is bound in the initialization.
The rule for typing methods checks the body of the method in an environment where is extended with bindings for the formal parameters and . The type of the method body must conform to the declared return type.
There are a number of semantic checks applied to Cool programs that are not captured by formal typing rules. For example, a Cool program cannot contain an inheritance cycle. Similarly, a Cool program cannot contain a class that inherits from . These rules are scattered through the .
The order in which these other checks are performed is . If a Cool program contains both an inheritance cycle and also a class that inherits from , the Cool compiler may report whichever error it prefers.
This section contains a mostly formal presentation of the operational semantics for the Cool language. The operational semantics define for every Cool expression what value it should produce in a given context. The context has three components: an environment, a store, and a self object. These components are described in the next section. Section 13.2 defines the syntax used to refer to Cool objects, and Section 13.3 defines the syntax used to refer to class definitions.
Keep in mind that a formal semantics is a specification only--it does not describe an implementation. The purpose of presenting the formal semantics is to make clear all the details of the behavior of Cool expressions. How this behavior is implemented is another matter.
Before we can present a semantics for Cool we need a number of concepts and a considerable amount of notation. An is a mapping of variable identifiers to . Intuitively, an environment tells us for a given identifier the address of the memory location where that identifier's value is stored. For a given expression, the environment must assign a location to all identifiers to which the expression may refer. For the expression, e.g., , we need an environment that maps to some location and to some location. We'll use the following syntax to describe environments, which is very similar to the syntax of type assumptions used in Section 12.
This environment maps to location , and to location .
The second component of the context for the evaluation of an expression is the (memory). The store maps locations to values, where values in Cool are just objects. Intuitively, a store tells us what value is stored in a given memory location. For the moment, assume all values are integers. A store is similar to an environment:
This store maps location to value and location to value .
Given an environment and a store, the value of an identifier can be found by first looking up the location that the identifier maps to in the environment and then looking up the location in the store.
Together, the environment and the store define the execution state at a particular step of the evaluation of a Cool expression. The double indirection from identifiers to locations to values allows us to model variables. Consider what happens if the value is assigned variable in the environment and store defined above. Assigning to a variable means changing the value to which it refers but not its location. To perform the assignment, we look up the location for in the environment and then change the mapping for the obtained location to the new value, giving a new store .
The syntax denotes a new store that is identical to the store , except that maps location to value . For all locations where , we still have .
The store models the contents of memory of the computer during program execution. Assigning to a variable modifies the store.
There are also situations in which the environment is modified. Consider the following Cool fragment:
let c : Int <- 33 in c
When evaluating this expression, we must introduce the new identifier into the environment before evaluating the body of the . If the current environment and state are and , then we create a new environment and a new store defined by:
The first step is to allocate a location for the variable . The location should be fresh, meaning that the current store does not have a mapping for it. The function applied to a store gives us an unused location in that store. We then create a new environment , which maps to but also contains all of the mappings of for identifiers other than . Note that if already has a mapping in , the new environment hides this old mapping. We must also update the store to map the new location to a value. In this case maps to the value , which is the initial value for as defined by the let-expression.
The example in this subsection oversimplifies Cool environments and stores a bit, because simple integers are not Cool values. Even integers are full-fledged objects in Cool.
Every Cool value is an object. Objects contain a list of named attributes, a bit like records in C. In addition, each object belongs to a class. We use the following syntax for values in Cool:
Read the syntax as follows: The value is a member of class containing the attributes whose locations are . Note that the attributes have an associated location. Intuitively this means that there is some space in memory reserved for each attribute. The value has dynamic type .
For base objects of Cool (i.e., s, s, and s) we use a special case of the above syntax. Base objects have a class name, but their attributes are not like attributes of normal classes, because they cannot be modified. Therefore, we describe base objects using the following syntax:
For s and s, the meaning is obvious. s contain two parts, the length and the actual sequence of ASCII characters.
In the rules presented in the next section, we need a way to refer to the definitions of attributes and methods for classes. Suppose we have the following Cool class definition:
class B { s : String <- "Hello"; g (y:String) : Int { y.concat(s) }; f (x:Int) : Int { x+1 }; }; class A inherits B { a : Int; b : B <- new B; f(x:Int) : Int { x+a }; };
Two mappings, called and , are associated with definitions. The class mapping is used to get the attributes, as well as their types and initializations, of a particular class:
Note that the information for class contains everything that it inherited from class , as well as its own definitions. If had inherited other attributes, those attributes would also appear in the information for . The attributes are listed in the order they are inherited and then in source order: all the attributes from the greatest ancestor are listed first in the order in which they textually appear, then the attributes of the next greatest ancestor, and so on, on down to the attributes defined in the particular class. We rely on this order in describing how new objects are initialized.
The general form of a class mapping is:
Note that every attribute has an initializing expression, even if the Cool program does not specify one for each attribute. The initialization for a variable or attribute is the default of its type. The default of is , the default of is , the default of is , and the default of any other type is . (5) The default of type is written .
The implementation mapping gives information about the methods of a class. For the above example, of A is defined as follows:
In general, for a class and a method ,
specifies that method when invoked from class , has formal parameters , and the body of the method is expression .
Equipped with environments, stores, objects, and class definitions, we can now attack the operational semantics for Cool. The operational semantics is described by rules similar to the rules used in type checking. The general form of the rules is:
The rule should be read as: In the context where is the object , the store is , and the environment is , the expression evaluates to object and the new store is . The dots above the horizontal bar stand for other statements about the evaluation of sub-expressions of .
Besides an environment and a store, the evaluation context contains a self object . The self object is just the object to which the identifier refers if appears in the expression. We do not place in the environment and store because is not a variable--it cannot be assigned to. Note that the rules specify a new store after the evaluation of an expression. The new store contains all changes to memory resulting as side effects of evaluating expression .
The rest of this section presents and briefly discusses each of the operational rules. A few cases are not covered; these are discussed at the end of the section.
An assignment first evaluates the expression on the right-hand side, yielding a value . This value is stored in memory at the address for the identifier.
The rules for identifier references, self, and constants are straightforward:
The tricky thing in a expression is to initialize the attributes in the right order. If an attribute does not have an initializer, evaluate an assignment expression for it in the final step. Note also that, during initialization, attributes are bound to the default of the appropriate class.
The two dispatch rules do what one would expect. The arguments are evaluated and saved. Next, the expression on the left-hand side of the is evaluated. In a normal dispatch, the class of this expression is used to determine the method to invoke; otherwise the class is specified in the dispatch itself.
There are no surprises in the if-then-else rules. Note that value of the predicate is a object, not a boolean.
Blocks are evaluated from the first expression to the last expression, in order. The result is the result of the last expression.
A evaluates any initialization code, assigns the result to the variable at a fresh location, and evaluates the body of the . (If there is no initialization, the variable is initialized to the default value of .) We give the operational semantics only for the case of with a single variable. The semantics of a multiple
is defined to be the same as
Note that the rule requires that the class hierarchy be available in some form at runtime, so that the correct branch of the can be selected. This rule is otherwise straightforward.
There are two rules for : one for the case where the predicate is and one for the case where the predicate is . Both cases are straightforward. The two rules for are also straightforward:
The remainder of the rules are for the primitive arithmetic and logical operations. These are all easy rules.
Cool s are 32-bit two's complement signed integers; the arithmetic operations are defined accordingly.
The notation and rules given above are not powerful enough to describe how objects are tested for equality, or how runtime exceptions are handled. For these cases we resort to an English description.
In , first is evaluated and then is evaluated. The two objects are compared for equality by first comparing their pointers (addresses). If they are the same, the objects are equal. The value is not equal to any object except itself. If the two objects are of type , , or , their respective contents are compared. and are handled similarly. The case for integer arguments is simple:
... but and also admit comparisons. String comparisons are performed using the standard ASCII string ordering (e.g., ). For booleans, is defined to be less than . Any other comparison (e.g., a comparison among non-void objects of different types) returns . Note that for some objects this may be unintuitive: if is a and is a then is but is also . Note also that comparison is based on the dynamic type of the object, not on the static type of the object.
In addition, the operational rules do not specify what happens in the event of a runtime error. When a runtime error occurs, output is flushed and execution aborts. The following list specifies all possible runtime errors.
Each outstanding "method invocation" (static or dynamic) and each outstanding "new" object allocation expression counts as a "Cool Activation Record". (Just to be clear, that second clause about "new" is counting currently-resolving constructor calls, not "total objects living in the heap".) A Cool interpreter flag a "stack overflow" runtime error if and only if there are (one thousand) or more outstanding Cool Activation Records.
Finally, the rules given above do not explain the execution behaviour for dispatches to primitive methods defined in the , , or classes. Descriptions of these primitive methods are given in Sections 8.3-8.5.
Cool Assembly Language is a simplified RISC-style assembly language that is reminiscent of MIPS Assembly Language crossed with x86 Assembly Language. It also features typing aspects that may remind one of Java Bytecode.
A Cool Assembly Language is a list of . Each instruction may be preceded by any number of . Comments follow the standard Cool conventions. In addition, a semicolon functions like a double dash in that it marks the rest of that line as a comment. The Cool CPU is a load-store architecture with eight general purpose registers and three special-purpose registers. For simplicity, a machine word can hold either a 32-bit integer value or an entire raw string; regardless, all machine words have size one.
This document assumes that you already have some familiarity with assembly language, registers, and how CPUs operate. We first present a formal grammar and then explain the semantics. Only terms in font are part of the formal grammar. Text after is a comment. We use for non-terminals.
That's it, and the last two do not really count. We next describe the interpretation of these instructions in more detail.
The system calls available are:
That system calls correspond directly to internal predefined methods on Cool Int and String objects. The key difference is that the system calls work on raw values (i.e., machine-level ints and strings) and not on Cool Objects.
The normal Cool compiler executable (e.g., ) also serves as a Cool CPU Simulator that executes Cool Assembly Language programs. Just pass as an argument.
The simulator performs the following actions:
The constant values listed above (1000; 20,000; 2,000,000,000) should not be counted on by your program, but are listed here to help with debugging. Addresses near 1000 hold program instructions or compile-time data (i.e., the code segment), addresses near 20,000 hold the heap, and addresses near two billion are on the stack.
Debugging assembly language programs is notoriously difficult! While writing your code generator, you will spend quite a bit of time running generated Cool Assembly programs through the Cool CPU Simulator to see if they work. Often they will not. The Cool CPU Simulator has been designed with a large number of features to aid debugging. Basically none of these features are present in traditional assemblers, so you actually have a wealth of debugging support, but it will still be difficult.
The Cool reference compiler also includes options to produce control flow graphic visualizations in the style of the dotty tool from the Graphviz toolkit.
Passing the option (with, for example, ) produces , which can then be inspected via a number of tools. For example, this program:
class Main { main():Object { if (isvoid self) then (new IO).out_string("cannot happen!\n") else (new IO).out_string("hello, world!\n") fi }; };
Might produce this control-flow graph:
While you do not have to match the reference compiler exactly, inspecting its control-flow graphs can help you debug your own code to create control-flow graphs.
As discussed above, the Cool reference compiler also includes a reference machine simulator to interpret Cool Assembly Language instructions. This simulator can be invoked directly by passing a file to :
cool$ cat hello-world.cl class Main { main():Object { (new IO).out_string("hello, world!\n") }; }; cool$ ./cool --asm hello-world.cl cool$ ./cool hello-world.cl-asm hello, world!
The simulator can also give detailed performance information:
cool$ ./cool --profile hello-world.cl-asm hello, world! PROFILE: instructions = 107 @ 1 => 107 PROFILE: pushes and pops = 29 @ 1 => 29 PROFILE: cache hits = 22 @ 0 => 0 PROFILE: cache misses = 570 @ 100 => 57000 PROFILE: branch predictions = 0 @ 0 => 0 PROFILE: branch mispredictions = 11 @ 20 => 220 PROFILE: multiplications = 0 @ 10 => 0 PROFILE: divisions = 0 @ 40 => 0 PROFILE: system calls = 2 @ 1000 => 2000 CYCLES: 59356
The execution time of a Cool Assembly Language program is measured in simulated instruction cycles. In general, each assembly instruction takes one cycle. Some instructions, such as system calls or memory operation, can cost many more cycles. The total cycle cost of a program is the sum of all of its component cycle costs.
In modern architectures, memory hierarchy effects (e.g., caching) and branch prediction are dominant factors in the execution speed of a program. To give you a flavor for what real-world code optimization is like, the Cool Simulator also simulates a cache and a branch predictor.
The Cool Simulator features a 64-word least-recently-used fully associative combined instruction and data cache. It also uses a static backward = taken, forward = not taken branch prediction scheme.
We now discuss each of the performance components in turn:
This cost model involves realistic components but potentially unrealistic values (e.g., a modern CPU would have a much larger non-associative cache, and also a much larger cache miss cost). If you're interested in that sort of performance modeling, take a graduate class in computer architecture. You should know that this CPU performance model is one of the most realistic that I've seen for a compiler optimization project in terms of the issues that it forces you to address.
The reference compiler includes a simple reference peephole optimizer, as well as a few optimizations backed by dataflow analyses (liveness, reaching definitions, constant folding) and register allocation enabled via the flag. You can use it to get an idea for how to get started (but note that we are evil and strip all comments from the optimized output).
yuki:~/src/cool$ ./cool --opt --asm hello-world.cl yuki:~/src/cool$ ./cool --profile hello-world.cl-asm hello, world! PROFILE: instructions = 79 @ 1 => 79 PROFILE: pushes and pops = 23 @ 1 => 23 PROFILE: cache hits = 15 @ 0 => 0 PROFILE: cache misses = 513 @ 100 => 51300 PROFILE: branch predictions = 2 @ 0 => 0 PROFILE: branch mispredictions = 7 @ 20 => 140 PROFILE: multiplications = 0 @ 10 => 0 PROFILE: divisions = 0 @ 40 => 0 PROFILE: system calls = 2 @ 1000 => 2000 CYCLES: 53542
For the program, this optimizer reduces the cycle cost from 59356 to 53453 -- a 10% improvement. If you are writing an optimizer, you will want to do at least as well as the reference, averaged over many input programs. Notably, you'll probably want to implement much more than the required dead code elimination optimization.
Cool is based on Sather164, which is itself based on the language Sather. Portions of this document were cribbed from the Sather164 manual; in turn, portions of the Sather164 manual are based on Sather documentation written by Stephen M. Omohundro.
A number people have contributed to the design and implementation of Cool, including Manuel Fähndrich, David Gay, Douglas Hauge, Megan Jacoby, Tendo Kayiira, Carleton Miyamoto, and Michael Stoddart. Joe Darcy updated Cool to the current version.
This version (used in Virginia Programming Language Design and Implementation courses) of Cool owes a great debt to George C. Necula and Bor-Yuh Evan Chang.