Core language¶
A program in Agda consists of a number of declarations written in an *.agda
file. A declaration introduces a new identifier and gives its type and
definition. It is possible to declare:
record types (including coinductive records)
function definitions (including mixfix operators, abstract definitions, and opaque definitions)
precedence (fixity)
pragmas, and
Declarations have a signature part and a definition part. These can appear separately in the program. Names must be declared before they are used, but by separating the signature from the definition it is possible to define things in mutual recursion.
Grammar¶
At its core, Agda is a dependently typed lambda calculus. The grammar of
terms where a
represents a generic term is:
a ::= x -- variable
| λ x → a -- lambda abstraction
| f -- defined function
| (x : a) → a -- function space
| F -- data/record type
| c a -- data/record constructor
| s -- sort Seti, Setω+i
Syntax overview¶
The syntax of an Agda program is defined in terms of three key components:
Expressions write function bodies and types.
Declarations declare types, data-types, postulates, records, functions etc.
Pragmas define program options.
There are also three main levels of syntax, corresponding to different levels of interpretation:
Concrete is the high-level sugared syntax, it representing exactly what the user wrote (Agda.Syntax.Concrete).
Abstract, before typechecking (Agda.Syntax.Abstract)
Internal, the full-intepreted core Agda terms, typechecked; roughly corresponding to (Agda.Syntax.Internal).
The process of translating an *.agda
file into an executable has several
stages:
*.agda file
==[ parser (Lexer.x + Parser.y) ]==>
Concrete syntax
==[ nicifier (Syntax.Concrete.Definitions) ]==>
'Nice' concrete syntax
==[ scope checking (Syntax.Translation.ConcreteToAbstract) ]==>
Abstract syntax
==[ type checking (TypeChecking.Rules.*) ]==>
Internal syntax
==[ Agda.Compiler.ToTreeless ]==>
Treeless syntax
==[ different backends (Compiler.MAlonzo.*, Compiler.JS.*, ...) ]==>
Source code
==[ different compilers (GHC compiler, ...) ]==>
Executable
The following sections describe these stages in more detail:
Lexer¶
Lexical analysis (aka tokenization) is the process of converting a sequence of
characters (the raw *.agda
file) into a sequence of tokens (strings with a
meaning).
The lexer in Agda is generated by Alex, and is an adaptation of GHC’s lexer.
The main lexing function lexer
is called by the
Agda.Syntax.Parser.Parser
to get the next token from the input.
Parser¶
The parser is the component that takes the output of the lexer and builds a data structure that we will call Concrete Syntax, while checking for correct syntax.
The parser is generated by Happy.
Example: when a name is a sequence of parts, the lexer just sees it as a string, the parser does the translation in this step.
Concrete Syntax¶
The concrete syntax is a raw representation of the program text without any desugaring at all. This is what the parser produces. The idea is that if we figure out how to keep the concrete syntax around, it can be printed exactly as the user wrote it.
Nice Concrete Syntax¶
The Nice Concrete Syntax
is a slightly reorganized version of the
Concrete Syntax
that is easier to deal with internally. Among other
things, it:
detects mutual blocks
assembles definitions from their isolated parts
collects fixity information of mixfix operators and attaches it to definitions
emits warnings for possibly unintended but still valid declarations, which essentially is dead code such as empty
instance
blocks and misplaced pragmas
Abstract Syntax¶
The translation from Agda.Syntax.Concrete
to Agda.Syntax.Abstract
involves scope analysis, figuring out infix operator precedences and tidying
up definitions.
The abstract syntax Agda.Syntax.Abstract
is the result after desugaring
and scope analysis of the concrete syntax. The type checker works on abstract
syntax, producing internal syntax.
Internal Syntax¶
This is the final stage of syntax before being handed off to one of the backends. Terms are well-scoped and well-typed.
While producing the Internal Syntax
, terms are checked for safety. This
safety check means termination check and
coverage check for functions, and positivity check
for datatypes.
Type-directed operations such as instance resolution and disambiguation of overloaded constructors (different constructors with the same name) also happen here.
The internal syntax Agda.Syntax.Internal
uses the following haskell
datatype to represent the grammar of a Term
presented above.
data Term = Var {-# UNPACK #-} !Int Elims -- ^ @x es@ neutral
| Lam ArgInfo (Abs Term) -- ^ Terms are beta normal. Relevance is ignored
| Lit Literal
| Def QName Elims -- ^ @f es@, possibly a delta/iota-redex
| Con ConHead ConInfo Elims
-- ^ @c es@ or @record { fs = es }@
-- @es@ allows only Apply and IApply eliminations,
-- and IApply only for data constructors.
| Pi (Dom Type) (Abs Type) -- ^ dependent or non-dependent function space
| Sort Sort
| Level Level
| MetaV {-# UNPACK #-} !MetaId Elims
Treeless Syntax¶
The treeless syntax is intended to be used as input for the compiler backends. It is more low-level than the internal syntax and is not used for type checking. Some of the features of the treeless syntax are:
case expressions instead of case trees
no instantiated datatypes / constructors
For instance, the Glasgow Haskell Compiler (GHC) backend translates the treeless syntax into a proper GHC Haskell program.
Another backend that may be used is the JavaScript backend, which translates the treeless syntax to JavaScript code.
The treeless representation of the program has A-normal form (ANF). That means that all the case expressions are targeting a single variable, and all alternatives may only peel off one constructor.
The backends can handle an ANF syntax easier than a syntax of a language where one may case arbitrary expressions and use deep patterns.