proj-oot-oot

Project status

I am just writing down ideas. I have many pages of notes on things i may or may not want to put into Oot. Please note that my philosophy is, whenever i take notes for myself, if i don't want to keep them private, to put them on my website; these notes are therefore here, not because they are intended to be readable by anyone other than me, but just as notes to myself. Over time i hope to evolve them into something readable and after that i'll remove this disclaimer.

At this point, the design is unfinished; there is no coherent concept of a language called 'Oot', just a bunch of ideas towards such a proposal. There is no timeframe to actually implement this, and it most likely will never get finished.

To be clear, i repeat myself: there is no such thing as Oot. Oot has neither been designed nor implemented. These notes about 'Oot' contain many conflicting proposals.

This document is very out of date and no longer serves as a good introduction to the evolving language proposal.

You may want to look at proj-oot-whyOot.

There are a bunch of other files in this folder with more details and notes on various parts of Oot: proj-oot.

update: i'm starting to develop an OVM (Oot Virtual Machine) implementation. This is not Oot, but it is the language that Oot will be implemented in (for portability), and it will also be the IL to which Oot programs are compiled. See https://gitlab.com/oot/ovm


Oot

Oot is a readable, massively concurrent, programmable programming language.

Oot is a general-purpose language that particularly targets the following application domains:

TODO: the above lists, while detailed and accurate, are maybe a little too long and too wordy; compared to the list in the OLD section below

Why not Oot?

Oot aspires to combine the readability and ease of Python, the functional laziness of Haskell, the commandline suitability of Perl, the straighforwardness of C, the metaprogrammability of Lisp, and the simplicity of BASIC (well, not quite), with massive concurrency.

TODO: unify this with whyOot.txt:Intro

Hello world

print! "Hello world!"

Overview

In this section we briefly explain the basics of Oot. Our purpose is to give you a sense of Oot, and also the background you'll need for understanding the examples in the Tour.

TODO this is much too detailed, move a lot of this into 'details'

Overview of syntax

Initial gotchas of Oot syntax

If you are coming from another language, you may be surprised by the following aspects of Oot's syntax and builtin functions:

General remarks

Unlike e.g. Python, the AMOUNT of whitespace does not matter in Oot; but the PRESENCE or ABSENCE of whitespace, and the TYPE of whitespace (whether it is a space, a newline, or a blank line) does matter.

Only punctuation affects the way things are parsed. There are no alphanumeric 'reserved words' except for the 26 one-letter uppercase macros (eg 'A' is infix boolean 'and').

A person reading code does not have to look up any function or macro definitions to see how things will parse (except of course within metaprogramming constructs that essentially pass a string to something like a custom 'eval').

The general idea is to start with a hypothetical homeoiconic syntax, but (in comparison to Lisp) built upon associative arrays/labeled trees instead of Lisp's lists/s-exprs/unlabeled trees. Then, add syntax for convenience, but the syntax is generic rather than specific to particular language constructs. Symbols may have more than one meaning depending on context, but only when these meanings are conceptually related (to aid memory).

Graph constructors

TODO (see each of the two, somewhat conflicting, sections below for now:

note that within data constructors, the lhs of = and / can be dynamically computed at runtime by using parentheses (entering code context) TODO MOVE TO DETAILS

todo within data constructors, ["" means 'the items in here are all quoted', eg [""a b c] == ["a" "b" "c"]; except that as a special case, both [""] and ["" ] are equivalent to [ "" ] (a list containing only the empty string).

Semicolons (';') delimit 'lines'

Semicolons denote a sequencing relationship between the things they delimit.

Semicolons are often implicitly added at newlines, but any Oot program can be written as a 'one-liner' by using explicit semicolons.

To delimit a line without denoting a sequencing relationship (this also hints to the Oot implementation that it may be worth it to parallelize the evaluation of the two things broken up in this way), use colon-semicolon, ':;', which you can think of as a semicolon with a modifier.

todo: in 'a ; b :; c ; d', does ; or :; have precedence, that is, is this like a ; (b :; c) ; d, or like (a ; b) :; (c ; d)?

Comments are two or more semicolons (';;')

Comments are started by two or more adjacent semicolons. If the semicolons are followed by whitespace, the comments run to the end of the current line. Otherwise, the semicolons and the whitespace characters following the semicolons together form a custom delimited that will delimit the end of the comment.

print 1+1  ;; nice day today
print 1+1  ;;;;; yay!
print 1+  ;;xyz one ;;xyz 1
print 1+  ;;xyz man this is
                quite a long
                comment
                ;;xyz 1

Identifiers and uppercase/lowercase

Identifiers can consist of alphanumeric characters and dashes, except that they cannot start with a dash, and if they end with a dash then that has a special meaning. The functions of a string of alphanumeric characters and dashes varies depending on case:

Identifiers that end with one dash are 'private' to the module they are defined in. Identifiers that end with two dashes are reserved for use by the Oot language.

Grouping and precedence

Grouping is supplied by parenthesis (), blocks {}, graph constructors [], semicolons ;, and colons :.

Blocks are constructor for first-class (lists of) expressions.

The precedence of infix operators can be determined by the characters in them (eg you don't have to look up the function definition of infix operators in order to determine how they parse).

Colons (':') are for implicit grouping

todo: mb this should be ',,' ',,,' etc; they can still be multidim when in data context; then ':' can be left for type annotation; although we could use '::' for that instead

A suffix colon begins a (possibly multiline) parenthetical group; a prefix colon identifies a keyword, and creates an implicit block as its argument.

For example:

if: condition
  :then
    first
  :else
    second

Infix colons create implicit parentheses on both the lefthand and the righthand of the colon. Multiple infix colons do the same thing, but have lower precedence the more of them there are. You can think of infix colons as 'pushing away' everything on the left from everything on the right.

For example,

a b : c d

is equivalent to

(a b) (c b);

Another example:

a b :: c d : e f 
g h : i j

is equivalent to:

(a b) ((c d) (e f));
(g h) (i j);

Commas are for implicit data construction and grouping

Infix comma turns the current grouping into an implicit data constructor:

3 , 5 == [3 5]

<> is infixification; () is prefixification

To turn a function into an infix operator, surround it with <>s:

div 10 5 == 2 == 10 <div> 5

The thing in the middle of the <>s must be an identifer or literal, not some more complicated expression. And the <>s must surround it without spaces in between. (this is not ambiguous with usage of < and > as less-than, greater-than, because less-than/greater-than are non-associative and so must be explicitly parenthesized if they appear in the same grouping; eg 0 < 5 > 2 is illegal because it is ambiguous between (0 < 5) > 2 and 0 < (5 > 2))

To turns an infix operator into a non-infix (prefix) function, surround it with parentheses:

5 + 3 == 8 == (+) 5 3
addfn = (+)
add_one_to_all = map (+)

Regions

There is also something called a 'region', marked by double braces: {{}}. Regions are distinguished by an associated value, which immediately precedes the region opening and immediately succeeds the region closing. For example, the following region is associated with the value '3':

v = 3
v{{print "Hi"}}v

Regions do NOT affect parsing or grouping but are rather a method for marking locations in code or data with boundary annotations.

Whitespace: spaces, newlines, and blank lines; prefix and postfix punctuation

Any Oot program can be expressed in a single line. Newlines usually represent implicit semicolons (except when the line ends with a '\', or when all parentheses have not yet been closed). A region of text bounded by square brackets or blank lines is called a 'paragraph'. Paragraphs implicitly close open parentheses. Eg:

print 1 + 1
print (1 +
1)
print (1 +
1

print 2

is equivalent to:

print 1 + 1; print (1 + 1); print (1 + 1); print 2;

In Oot, it sometimes matters whether an operator is separated from its argument by a space (freestanding), or whether it is 'attached', and if it is attached, whether is appears on the left or the right side of its argument. So there are four cases, each of which may mean different things:

Often prefix versions of operators are something like a 'constructor', and the suffix version is the corresponding 'destructor'.

todo: should infix attached and unattached be the same except for grouping and mixed case? i think so

References

make a reference:

y = {x}

dereference: !y

Expressions, function application

Everything is an expression. The last line of a function is its (primary) return value. The last line of a branch within an 'if' statement is its return value.

To apply the function 'add' to 1 and 'x' and store the result in 'result':

result = add 1 x

(function application is by juxtaposition)

Partial function application is achieved just by not giving all required arguments:

increment = add 1
result = increment x
result == increment x == (add 1) x

(functions are curried and left-associative)

Functions can have keyword arguments, which are bound using '/':

result = bake-n-pies 3 KIND/"pumpkin"

The keywords in keyword arguments (ie the things on the left side of the '/', eg 'kind' above) can be written in lowercase, in which case they will be autocapitalized, eg:

result = bake-n-pies 3 kind/"pumpkin"

If you want to dynamically compute the keyword at runtime, put parentheses around it, eg:

result = bake-n-pies 3 (keyword-from-str "KIND")/"pumpkin"

Keyword arguments are always optional, and optional arguments are always keyword arguments. When a function is partially applied, any keyword arguments not explictly given are implicitly assigned to their defaults; they do not remain open for later assignment; however, this behavior can be overridden by using the $/ operator (todo: sure about this? or should the defaults happen when all of the position arguments are used up?):

g = f x $/

A function can directly access the args it has been passed as 'args--'. To access the args from a higher lexical scope, TODO ('..'? more dashs?)

Variable assignment

x = 3
x = 4

Note that in Oot, '=' does not behave like it does in mathematics; in mathematics, if x equals something, then it is always equal to that thing and can't change to be not equal to it later, but in Oot, x is not a statement of equality but rather an action (assignment), and what is assigned to a given variable can change over time.

Defining functions

Similar syntax to variable assignment, except use ':=' instead of '=':

f x y := x + y + 3
f x y := {k = 3; x + y + k}
f 1 2 == 6

Functions with multiple return values (and optional return values) are created using a 'double assignment' syntax. Calling the function normally gives only the first return value. To get the other return value(s), use an assignment with '/'s attached to some of the variables to indicate which arguments they capture (the same syntax as giving keyword arguments to functions). Since the optional return arguments must be assigned via keyword, multiple return arguments can appear in any order:

z k = f x y := {k = 3; z = x + y + k}

main_result = f 1 2
main_result == 6

main_result K/secondary_result = f 1 2
main_result == 6
secondary_result == 3

K/secondary_result main_result = f 1 2
main_result == 6
secondary_result == 3

Note that commas on the LHS create an implicit data constructor, which destructures a single return value, rather than separating multiple return values:

z k = g x y := {k = 3; z = [x y]}

main_result = g 1 2
main_result == [1 2]

main_result K/secondary_result = g 1 2
main_result == [1 2]
secondary_result == 3

[x y] K/k = g 1 2
x == 1
y == 2
k == 3

[x y] = g 1 2
x == 1
y == 2

To cause a function to return all of its potentially multiple arguments as a graph, use a postfix asterisk ('*') on the function call:

z k = g x y := {k = 3; z = [x y]}

result = g* 1 2
result == [[1 2] K/3]

[[x y] K/k] = g* 1 2
x == 1
y == 2
k == 3

[x y], K/k = g* 1 2
x == 1
y == 2
k == 3

[x y] K/k = g* 1 2  ;; ERROR; although g returns two values, g* only returns one (unlabeled) value, yet here we erroneously attempt to extract a second value labeled "K"

Built-in Operators

todo

Unary (unary prefix):

!    logical NOT
-    unary minus (arithmetic negation)

Binary:

On numbers, +, -, *, <, <=, >, >= have their usual arithmetic meanings. In addition:

&    logical AND
|    logical OR

Trinary:

}}}


==== Custom operators ====
To define a custom operator, first define an ordinary function, then assign it to the operator, eg:

{{{
f x y = x + y
+<< = f

Custom operators must be composed entirely of punctuation characters.

Unary custom operators must begin with '!'

Binary custom operators must begin with one of:

*+>%&|-

Any binary operator beginning with * or + is assumed to be associative with itself. Other operators do not associate with themselves and must be explicitly parenthesized if they are repeated.

Ternary custom operators must begin with '<'

Unary custom operators are used by attaching them as a prefix to their argument, eg:

Binary custom operators are used freestanding or infix. They have three precedence levels:

Ternary custom operators are used by enclosing the second argument with the operator and its mirror image, with no intervening spaces, and surrounding that with arguments arg1 and arg3 (todo: better explanation here). Eg:

f x y z = (x - y) * (z - y)
<+ = f
5 <+1+> 3 == (5 - 1) * (3 - 1) == 8

To aid readability, the middle argument to a ternary operator can only be a single identifer or a literal, not a more complicated expression:

5 <+(0+1)+> 3 ;; SYNTAX ERROR; middle argument to ternary must be single identifier or literal

After the initial character, a custom operator may contain one or more characters which are any of:

*+<>%*&|~@#$^?

(todo can they really contain any of those characters?)

Note that binary custom operators may not contain '!' or '-' after the first character, to allow things like arg1*-arg2 to be easily and unambiguously parsed as arg1*(-arg2).

Note: there is one unary operator consisting of a single '-' character, unary minus, eg "-3"; this is a special case; additional custom unary operators cannot be defined which start with '-', because that would make it difficult for the reader to tell at a glance if the operator was unary or binary.

There may be more than one unary operator applied to the same alphanumeric base. Unary operators associate to the right with any other unary operators.

Note that although function application binds tighter than any freestanding infix, attached infix binds tighter than function application, eg "f x + 2" is "(f x) + 2" but "f x+2" is "f (x+2)", because attached infix is as if the attached group was surrounded by parentheses. Within an attached infix grouping, the same rules of precedence hold.

To the extent that the order of operations is not determined by these rules, expressions must have explicit parentheses to resolve the remaining ambiguity (it is a syntax error not to).

^# is for imports

^# as a statement of it's own imports a module directly into its containing namespace:

^#math
cos 0 == 1

By addressing into ^#, you can import individual items from the module namespace:

^#math.cos
cos 0 == 1

todo: should 'cos' be in quotes

Or multiple names at once:

^#math.[cos sin]
cos 0 == 1
sin 0 == 0

^## imports a module and assigns it to a variable whose name is the module name:

^##math
math.cos 0 == 1

^# as part of an expression evaluated to the imported module, and can be used to eg assign the module to a different name:

mymath = ^#math

Names within a module may also be accessed via ^# within an expression:

mycos = ^#math.cos

Or multiple names at once:

mycos, mysin = ^#math.[cos sin]

^#! is 'raw' import, which means just including the text of the destination file at this point within this file as if it were typed here:

if filename.txt contains "x" then

x = "Hello World!"; P (%#! filename.txt)

is equivalent to:

x = "Hello World!"; P (x)

Note: it is a syntax error to have unbalanced grouping in either the containing file or the raw imported file; eg you cannot have "{x = %#! filename.txt" in the containing file and "1}" in filename.txt

All ^# and ^#! imports are compile time by default. To do a run-time import, use ^#$ and ^#!$ instead. Most run-time import metaprogramming does not affect the containing code (the code outside of the modules/files which are being imported).

# is for footnotes

In order to make it easier to quickly skim an unfamiliar function, footnotes provide a mechanism to separate the exposition of the 'main idea' of a piece of code from 'details' (for example, error handling). Example:

x = 10 y = if (errorCondition): handleError else: doSomething x print ('y = ' + str(y))

can also be written as:

x = 10 y = doSomething x #1 print ('y = ' + str(y))

  1. 1= if (errorCondition): handleError else:

When footnotes are used, the intended implication is that the reader is encouraged to first read the block of code without bothering to consult the footnotes, to get an idea of what is generally going on, and only afterwards to delve into the footnotes. Footnotes which are not just 'details' but which rather change the 'basic idea' of what is going are considered to be an abuse of this facility; ordinary macros should be used for this instead.

@ is for binding parts of an expression to a name

todo

This is used in graph constructors, and in pattern matching.

Suspension

Blocks are suspensions (they are 'run' with '!').

TODO (some applications: 0-ary and keyword)

TODO apply--

equals is for assignment

Although we emphasize immutable data and eschew unexpected side-effects, we permit 'local mutation', for example, mutable variable assignment, and creation of new values by operations that look syntactically like 'mutating' old values. This is because these can be converted, through simple syntactic transformations that are local to one function, into purely functional code.

The values themselves are immutable (except for values stored in 'reference variables', todo what's the syntax/sigil for those?). Therefore these 'mutations' have no side-effects aside from the rebinding of the indicated variable to a new value; in particular, mutating a parameter passed into a function has no effect outside of that function, and if you do "a = b", then the value of b is copied into a, and mutations to 'b' will have no further effect on 'a'.

What is dangerous is not these local violations of referential transparency, but global ones; for example, hidden aliasing between variables, especially across function boundaries.

The '=' sign is used for assignment. When a simple assignment is made (a simple assignment is when the lhs (left hand side) consists of just a variable name), the variable is rebound to the value on the rhs (right hand side).

If a composite assignment is made, one of the form "f x", where f is any expression, then a new function value 'g' is created which is like f, except that on input x, the value returned by 'g x' is the rhs; then this new function is assigned to f (as if "f = g" were written; if 'f' is itself composite, this causes a recursion).

eg:

x = 3
x = 4
x == 4
x = 3
x = 4
f =   [1 2 3]
f 1 = 5
f ==  [1 5 3]
f2 =  [10 [1 2 3] 30]
f2 1 1 = 5   ;; first, (f2 1) 1 = 5, so a g is created such that g = (f2 1) except that g 1 = 5; then we do (f2 1) = g, and recurse
f2 == [10 [1 5 3] 30]

TODO , if the class of the value on the lhs (left hand side) of the assignment provides a function named "set--", then this is called. TODO: "the class of"?

Multiple equals are for equality comparisons

Two equals signs, '==', is the boolean 'is' function as a binary comparison operator (todo).

Three equals signs, '===', is structural equality.

Four equals signs, '====', is pointer equality.

// is like an arrow

Think of is a shorthand for '-->'. It comes up in constructs like 'cond':

For example:

i = 1
j = 2
cond: i
  0     // "zero"
  1     // "one"
  j     // "two"

or

i = 1
j = 2
cond: i
  i == j  // "equal"
  i -= j  // "not equal"

It also comes up in other situations where there is a list of rules, with the part to the left of the is the condition under which the rule applies, and the part to the right of the is what is true (or what is should be executed) when the condition is true.

Overview of data

Because of their centrality to Oot, graph constructors have already been covered above.

Atomic Literals

Atomic literals consist of ints (3), floats (3.0), strings ("hi"), booleans (TRUE), and NIL.

Some Oot implementations may support Unicode in source files. In this case, Unicode is permitted only within string literals.

String literals by default support interpolation with '$' and character escaping with '\'; within an ordinary string literal, '$' and '\' and '"' must be themselves escaped by '\'. There are also 'raw string literals', of the form r"". Raw string literals do not support escaping, and so cannot contain the character '"'. (todo: should we instead do what Python does [1]? why does it do this?)

Rooted tree addressing

Some graphs are rooted trees (for example, a file hierarchy). In an Oot rooted tree, if you have a variable containing any node in the tree, it is possible to get the parent node of that node, or the root of the tree.

Say variable 'r' contains a graph which is a rooted tree.

'r' refers to the root.

'r a' refers to node A which is a child of the root node, 'r a b' refers to node B which is a child of node A which is a child of the root.

Say 'x = r a b'.

'x ..' refers to the parent node of x, that is, to the value 'r a'.

'x ...' refers to the root node of the tree containing x, that is, to the value 'r'.

In addition, as a syntactic convenience, there is a semiglobal (see below) called '--cwn--'. To refer to a child of the node in --cwn--, prefix an identifier or glob with '.'. For example, the following will set variable 'y' to the value r a b:

--cwn-- = r a
y = .b

as a convenience, at program initialization, '--cwn--' is set to 'filesystem cwd'. Which means you can do:

somefiles2 = [.something*.txt .somethingElse*.txt]
somefiles2 = .something*.txt + .somethingElse*.txt  ;; equivalent

to refer to the list of files that is the concatenation of the list of files of the form something*.txt in the current working directory of the filesystem, and the list of files of the form somethingElse*.txt in the current working directory of the filesystem.

Globs

Globs are created by asterisks in code or data context.

In code context, globs are only valid when used in an expression prefixed by a '.' (see Rooted tree addressing, above). '.glob' is short for '[.glob]'.

In data context, globs insert one or more items by finding all nodes whose labels match the glob. For example, if cwd-- (see Rooted tree addressing, above) contains nodes 'something.txt, something1.txt, tuesday.txt, somethingElse.txt', then

[.something*.txt .somethingElse*.txt]

evaluates to

[(cwd-- something.txt) (cwd-- something1.txt) (cwd-- somethingElse.txt)]

Semiglobals

A semiglobal is a dynamically scoped threadlocal variable. If you change a semiglobal and then call a function, that function sees the change you made. But if the function you called further changes the semiglobal and then returns, you do not see the change it made.

A semiglobal is marked by a prefix --, for example:

maybe-round-then-print = {if (--should-round-before-printing) {X = int (round X)}; print X}

;; note that at this point, --should-round-before-printing == NIL


maybe-round-then-print 5.3            ;; prints 5.3

f = {--should-round-before-printing = T; maybe-round-then-print X}
f 5.3                                 ;; prints 5
     ;; at this point --should-round-before-printing is still 5.3,
     ;; because a change to a semiglobal doesn't affect environments
     ;; higher in the callstack
maybe-round-then-print 5.3            ;; prints 5.3

ADTs

= for In-place mutations

Often you want to say:

x = f x y

A shortcut for this is:

x =f y

For example,

x =+ 1

is short for

x = x + 1

In addition, in some cases the compiler might recognize this as an 'in-place' mutation, eg:

aList =append newItem

which is short for:

aList = append aList newItem

One-letter shortcuts

Some of these are just shorthand for a longer item (eg. T and F are short for the boolean literals TRUE and FALSE). Some of them are common infixified functions (eg. M is infixified 'map'). Note that some of them can be put in the middle of a contiguous alphanumeric, eg "3Umeter" to represent the value 3 with unit "meter", or even "variable1Umeter" to indicate the value in variable1 with unit "meter".

They do not associate with each other (ie if you have more than one of them next to each other, you must use explicit grouping such as parentheses to say the order of operations). And some are more complex special forms (for example 'S' may be repeated multiple times to indicate search-and-replace)

Note that although these are the only alphanumeric 'reserved words' (alphanumeric special forms) in Oot, there are various other alphanumeric control-flow constructs (eg 'cond') which would be special forms in other languages but which do not need to be in Oot.

One letter shortcuts summary table

Often we capitalize a letter on the right side of the table to show a memorization aid.

The middle row shows what sort of thing this is. alias: just an abbreviation. todo: i havent decided yet. metaoperator: postfix unary transformation of operators. 2-ary/3-ary fn; acts as an infix fn but can also become ternary with repetition. language: does something more complex than any of the other categories. Note that 'fn' one-letter shortcuts act like ordinary (prefix) functions when unattached, or infix functions when attached; eg "R 0 10" == "0R10".

A    metaop      postfix unary       Accumulate
B
C    alias       binary fn           Cross-product
D    todo        todo                reDuce
E    alias       unary fn            raise Exception; short for exceptionShortcut where comment exceptiontype/exception = exceptionShortcut = raise new(exceptiontype str(comment))
F    alias       identifier          FALSE
G
H
I    alias       binary fn           <Is>; combination of ==, isInstance, isSubType (is instance of subtype), has attribute equal to, eg myException I Exception; eg bob I BLUE
J    todo        todo                list of plucks
K    todo        todo                plucK
L    todo        todo                fiLter
M    metaop      postfix unary       eleMentwise
N    todo        todo                patterN constructor
O
P    alias       unary fn            Print str(X)
Q    todo        todo                Query
R    fn          binary fn           Range
S    todo        todo                Substitute
T    alias       TRUE
U    language	 binary fn           <Unit>(uppercased with abbrevs); eg 3Um == (3 <unit> METERS)
V    language    binary fn           <View>; see section below
W    todo        part of Z           the 'with' in zipWith
X    language    identifier          implicit argument
Y    language    identifier          implicit argument
Z    alias       2-ary or 3-ary fn   <Zip>

One letter shortcuts details

A is for Accumulate. A transforms '+' into 'sum of a list' and '*' into 'product of a list' (more generally, A folds the operator over a list, taking (operatorIdentity operator) as the initial value if it is defined, and nil as the initial value otherwise):

+A [1,2,3,4] == 10
*A [1,2,3,4] == 24
&A == all
|A == any

B is reserved.

C for Cross-product. (todo do we need this? it's most useful in math types..)

[1 2] C [3 4] == [[1 3] [1 4] [2 3] [2 4]]

D for reDuce: todo? do we need this at all? 'fold' isn't too hard to type, and we already got Accumulate for the most obvious metaoperator case. Even if so, maybe assign to R.

E constructs and throws an Exception; E(X) is short for raise(exception(str(X))) (todo: or rather, short for exceptionShortcut where comment exceptiontype/exception = exceptionShortcut = raise new(exceptiontype str(comment))); ordinarily, 'exceptiontype' takes a comment as an optional argument)

if x < 10:
  E "x is too small!"

F is just an abbreviation for the literal FALSE

G is reserved for now

H is reserved for now

I is infixified 'is'

J is for 'list of plucks':

xJ[FIELDNAME1 FIELDNAME2] == map x (y => [FIELDNAME1/(y FIELDNAME1) FIELDNAME2/(y FIELDNAME2)])

K for plucK:

xPFIELDNAME == map x (y => y FIELDNAME)

L for fiLter:

[1 2 3 4]Feven == [2 4]

M for eleMentwise:

[1,2] +M [3,4] == [4,6]
elementwise_addition = (+M); elementwise_addition [1,2] [3,4] == [4,6]

N is a constructor for patterNs

O is not used (but still reserved) because in some fonts it looks too much like 0 (zero)

P is short for 'print!':

P "Hello world!"

Q for Query. Q is a special form (todo is this really a good idea?); Q may be repeated twice for search-and-replace (if 'Q' is needed within the arguments to Q, escape those with a backslash). On strings, this uses regular expressions, but more generally it can be used for database or other queries (replacing the string value in the left argument with a database accessor value). Eg:

success = "Hello there Alice!" Q "Hello .* {name}(\w+)!"
(success == TRUE) A (name == "Alice")

replaced = "Hello there Alice!" Q "Hello .* {name}(\w+)!" Q "I said 'Hello' to {name}."
replaced == "I said 'Hello' to Alice."

R creates a range of numbers (this is a shortcut for the function 'range'; if you need the optional third argument, 'step', then use 'range' instead):

1R10 = [1 2 3 4 5 6 7 8 9]

S is for Substitute. On strings, this is like sprintf in C or '%' in Python, although more generally it can be applied to other things too, such as symbolic expressions: (todo would we prefer $ or $$ for this?)

"My name is %s and my age is %d" S (name, age)

T is the literal TRUE

U is for Unit: eg "3Um" means '3 with units of meters': xUm == (unit (--unitabbrevs 'm') x)

V is for View:

print([apple/RED lemon/YELLOW]VITEMS)

[[apple RED] [lemon YELLOW]]

(todo: is the syntax for that right, eg is only the first uppercase letter in a mixed-case token the macro operator?)

W is an optional 'argument' for Z that turns it into zip-with (todo is this really worth spending a one-letter on?)

[1 2] Z [3 4] W + == [4 6]

X is a pronoun that acts like a lowercase identifier, but implicitly transforms its containing explicit block into an anonymous function taking a single argument, 'X':

{X > 0} == {x => x > 0}

Y is reserved for future use. (todo: now i'm changing my mind and thinking it should be a second implicit argument again)

Z for zip; Z may take the optional argument 'W' on the right, see above:

[1 2] Z [3 4] == [[1 3] [2 4]]

Parentheses allow multiword expressions within a one-letter macro:

[-5,5,-10,100]F(X > 0) == [5,100]
[1 4 9]F(mod X 2 == 0)Msqrt == [2]

Multiple one-letter macros bind to their arguments, and then those chunks associate to the left. This allows you to implement 'list comprehensions':

[1 4 9]FevenMsqrt == [2]
list1 = [[NAME/'orange' COLOR/'orange' TYPE/'plant'] [NAME/'dog' COLOR/'tan' TYPE/'animal']]; list1F(X TYPE == 'plant')M([(X NAME) (X TYPE)]) == [[NAME/'orange' TYPE/'plant']]

Unattached one-letter macros operate as if there were infix attached:

[1 4 9] F even M sqrt == [2]

Attached infix one-letter macros bind tighter than unattached one-letter macros (todo example): [10 11] Z [1 2]Z[3 4]W+ == [[10 4] [11 6]]

Attached prefix or suffix one-letter macros are partially applied:

[1 4 9]F == (filter [1 4 9])
Feven == (filter __ even)

todo: check out Perl's 'metaoperators' again and see what else we're missing

todo: and what about that Ruby thing that is like a partially-applied __get?

V is for view

Oot encourages immutable values and discourages implicit aliasing between variables. However, there are times when it is convenient to use the 'adaptor pattern' to expose multiple views of the same data. For example, perhaps you have a function that filters and deletes unwanted nodes from a list of nodes, and returns a list of nodes; and you have a tree datastructure. You want to select and delete nodes from that tree. What you would like to do is to pass in the tree datastructure to the filtering function; but since the filtering function only takes lists, you will have to pass it in the form of a list. However, at the end, you want to get back a tree, not a list; and that tree must be the same shape as the original tree, only without the deleted nodes.

Here are four ways to solve this:

(1) With mutations and aliasing: one way to handle this is to create a second object of type list. Mutations are applied by the filtering function to this second object via method calls. However, this second object is implicitly aliased to the tree, so that performing a mutation upon the list causes the corresponding mutation to be applied to the tree.

(2) Without mutations or aliasing or interfaces: way to handle this is to create a conversion function that traverses the tree and produces a list of nodes, with each node containing metadata that identifies to which node in the tree it corresponds. Then pass this list to the filtering function. Then take the resulting list of nodes and compare it to the tree to see which nodes are missing, and delete them from the tree. This is rather cumbersome.

(3) Without mutations or aliasing but with interfaces: is to create a function called adapt-tree-to-list. This returns something which appears to be an immutable list, in that it exposes the list interface. However, under the covers, it is actually a tree. This data structure is passed to the filtering function, which operates on it just like any other list, and then returns the result. This result is then passed thru a function unadapt-tree-to-list which recovers the hidden tree.

(4) With spatial interfaces instead of interface signatures: What about when the 'interface' primarily consists, not of a set of function signatures (the way we typically use the word 'interface' in Oot), but rather just certain conventions and assumptions about the shape of the data? In this case, the same pattern can be used, but the functions that translate operations on the list to operations on the tree spatially remap the operations rather than translating them to a different signature.

Oot provides special support for the third and fourth pattern, in order to make it easier to read and to write these sorts of things.

The 'V' one-letter operator selects a view for a variable.

$ is for variables and substitution

$ involves the passage from variables to their values.

Prefix $ indicates interpolation. In ordinary code, "x" would be replaced by the value of variable x when it is evaluated; but in some other contexts, such as within a string, "x" is left as-is and is not substituted; within some of these contexts (such as within a double-quoted string), $x indicates that 'x' should nevertheless be treated as a variable and should evaluate to the value of the variable named 'x'. For example,

x = "World!"
P "Hello $x"

Within a 'quoted' portion of the AST, $x is 'antiquote'.

(to go the other direction, that is, to refer to the VARIABLE 'x' within code without immediately evaluating it, is "?x"; in this sense "$x" and "?x" are opposites)

'$' unattached is the name of our 'default variable' or 'pronoun', which functions like 'it' in the English language (or like the 'default variable', $_, in Perl). The function of variable '$' is to save you a little time typing. When a function takes a variable name as an optional argument, typically by convention the default variable is '$'. For example, some control flow constructs, such as iterators, take the name of a variable name to bind; if no variable name is given, they use '$' as the variable. For example:

for [1 2 3] {P $}

is equivalent to:

for [1 2 3] $ {P $}

todo: what about the rule that required args must come before optional? does 'for' check the argument type and know that a block cannot be a variable name, and a variable name cannot be a block? or mb in general the 'required before optional' rule is only a rule when the types are not disjoint.

Postfix $ is used to control evaluation strategy. "x$" indicates that x is not to be immediately evaluated, but rather is to be evaluated leniently. example,

fibs = 0 : 1 : zipWith (+) fibs$ (tail fibs$)

(todo: this example is mostly from Haskell because i haven't yet decided upon the syntax for cons (':' in Haskell), nor whether there is syntax for zipWith or tail)

Since 'fibs' is an infinite sequence, it would cause an infinite loop were it to be actually constructed when the RHS of this definition is evaluated.

Using postfix $ on a variable being assigned to on the LHS attaches a 'lenient flag' (again, by 'lazy' we actually just mean non-strict) to the value. This prevents the value produced from being evaluated eagerly by default when it is produced in other expressions. For example:

fibs$ = cons 0 (cons 1 (zipWith (+) fibs (tail fibs)))

(todo: fix syntax again)

Here, the instances of 'fibs' on the RHS aren't marked as lenient, but when they are evaluated, the runtime sees that their value is a thunk with the 'lenient' flag set, and so does not recurse into them (at least not infinitely).

The lenient flag provides a mechanism to pass infinite, lenient data structures into ordinary functions without causing an infinite loop, eg:

integers$ = R 0 INF
cumsum arr = {y = 0; for arr x {y = y + x; cons y (cumsum (tail arr))}}
cumsum_of_integers = cumsum integers 

Here, the value assigned to the variable 'integers' is marked as lenient. 'cumsum' does not need to know about this, because the runtime detects that one of its arguments, 'arr', is lenient, and that it is recursing multiple times into 'arr'; this causes the value 'cumsum arr' to itself be marked as lenient. Evaluation of cumsum_of_integers is done in a lenient manner, that is, only a finite number of evaluations will be done each time a value is needed.

(actually, in this case 'integers' without the postfix $ would work, because the range function 'R' checks if its arguments are infinite, and if so, marks its result as lenient)

todo again, fix the cons syntax

The opposite of "x$" is "x$!". "x$!" strips the lenient flag from the value of x and causes x to be evaluated strictly.

Using postfix $ or $! on a parameter name on the LHS marks that parameter to be evaluated leniently or strictly when a function is called. Eg:

f x$ y$! = x + y

will create a function f that evaluates its first argument leniently and its second argument strictly.

The default evaluation strategy is to check to leniency flag of the value and evaluate it leniently if it is set, or strictly otherwise. This can be overridden by postfix $, which causes lenient evaluation; and postfix $! overriddens $, which causes strict evaluation. There are three places that postfix $ and $! can occur; within an expression (ie RHS); on a parameter in the LHS of a function definition; or on a pattern being matched in the LHS of a variable assignment. A postfix $ in any of these three places overrides the default strictness and causes leniency; if any of these three places have any postfix $! then that overrides any postfix $ and causes strict evaluation.

In more detail, by default, values are evaluated strictly if they have no lenient flag, or leniently if they do. Strict evaluation means that the value is evaluated when a function is applied to it or when it is assigned to a variable. However, when a value is bound to a term in an expression with a postfix '$', or when a function being applied to it has a postfix $ on the relevant parameter, or when the pattern it is being matched against has a postfix $ on the LHS, this overrides the default of strictness and the value is evaluated leniently. This leniency is itself overridden if the value is bound to a term in an expression with a postfix '$!', or a function being applied to it has a postfix $! on the relevant parameter, or the pattern it is matched against in the LHS has a postfix $!.

An even more extreme evaluation strategy is given by postfix $^ annotations. These preserve not just the value but the actual AST of the expression producing the value, preventing the Oot implementation from optimizing or pre-reducing the AST in any way, at least until the expression is actually evaluated. (todo: is this what call-by-text and fexprs are? if so, say so). This is used to preserve expressions to serve as inputs to 'run-time macros' (see below, todo). Note that this only preserves the expression, it does not actually quote it; the quote operator (^^ (?todo)) is still needed to do that. Eg:

x = 3
y = 1 + x
z = 2 + y
ast = ^^(z)

yields the ast of the expression "z", while

x = 3
y = 1 + x
z = 2 + y
ast = ^^($z)

yields the ast of the expression "6", while

x = 3
y$^ = 1 + x
z = 2 + y
ast = ^^($z)

yields the ast of the expression "2 + 1 + x", while

x = 3
y = 1 + x
z =$^ 2 + y$^
ast = ^^($z)

yields the ast of the expression "2 + y".

(todo doublecheck those)

Expressions included by $^ are 'unhygenic' and can refer or reassign local variables in the caller scope, as well as execute control flow (such as 'return') in the caller's scope.

Pass-by-reference is indicated by postfix '$&' (todo how about just postfix &?). For example,

swap-in-place a$& b$& := {c = a; a = b; b = c}
a = 1; b = 2
swap-in-place a$& b$&
a == 2
b == 1

$& must be affixed BOTH in the parameter within the function definition, AND at the callsite to be effective; if it occurs only at the function definition but not at a particular call-site, that particular call will be pass-by-value; but if it occurs only at the call site but not in the function, this is a type error. For example,

swap a$& b$& := {c = a; a = b; b = c; b,a}
a = 1; b = 2
swap-in-place a$& b$&
a == 2
b == 1

}}}

Also, /$ is the syntax for 'leaving open' an optional argument to a function while binding some of its other arguments.

Types of statements

a                assertions
x = a            assignments
f x := b         function definitions
o1 o2 = f x := b function definitions with multiple outputs
^#name           import
^@@LABEL         label

A statement that does not meet any of the above forms (except the first) is an assertion. The assertion is an expression which when evaluated is expected to be TRUE (and throws an assertionError if it is not).

~ is for the underlying platform, approximations, shortcuts, and special cases

Prefix ~: if you declare something an '~int' instead of an 'int', you might get a platform-specific approximation to an Oot int. If so, this will be interoperable with non-Oot code on the same platform, and it may be faster to work with than Oot ints, but it may also have different precision or semantics; for example it may overflow when an Oot int would not (in this case, an error would be raised; the overflow would not be silent). You can also apply prefix ~ to operations, eg ~sqrt is platform-specific sqrt. Whenever you execute approximate or platform-specific operations or calculate with approximate or platform-specific types, you must be prepared for PlatformSpecificApproximationErrors? to be raised. For example, if you used an ~int and got an 8-bit integer, and assigned 200 to it and then added 100 to it, this would overflow and raise a PlatformSpecificApproximationErrors? (subtype PlatformSpecificOverflowError?).

Overview of lenient evaluation

todo

Overview of (graph) patterns and destructuring bind

patterns are first class

OOP

& is for impurity

By default, Oot functions, expressions, and variables are 'pure' and/or 'referentially transparent'. There is no technical consensus on what these mean [2] (especially given differing definitions of 'value' [3]), but for our purposes purity/referential transparency means a computation which neither reads from nor writes to external state (it has no side-effects and is deterministic). By default, Oot variables hold immutable values; since values are immutable they cannot be 'aliased', or referenced by two different variables such that a mutation to one variable also affects the value when seen thru the other variable.

Oot functions and variables which are not pure ('impure') are prefixed with '&' to make this visually apparent.

There are some exceptions to this:

Overview of Oot tooling and standard libraries

The Oot implementation

There is a single canonical Oot implementation (or rather, family of Oot implementations for different targets).

By default, Oot interprets code, but can optionally compile it instead. Oot provides a REPL.

The Oot interpreter is designed to start quickly, enabling it to be used for commandline scripts.

The Oot REPL can run on, and Oot can be compiled to, many platform targets, including:

(new platforms may be added and old ones dropped depending on their popularity)

Depending on the target, static or dynamic linking, or both, may be available. On native targets, we offer at least static compilation.

Oot is self-hosting, and is bootstrapped from a small, portable core.

A compiled Oot program (as well as the Oot REPL, in some cases) can marshal and unmarshal data from, and can call and be called from, other programs and libraries on its target platform, enabling it to use a wide variety of other libraries, and to be used to create libraries in a wide variety of platforms; as well as to serve as a 'scripting language' within larger programs. For example, when an Oot program or library is compiled to the JVM, it can call or be called by other JVM programs or libraries. Oot programs and libraries on different targets can talk to each other, enabling Oot to be used as a 'glue language'.

For situations where the standard Oot runtime, tooling, or language features are too large, too slow, or too complex, Oot provides alternate 'profiles' with stripped-down functionality.

Oot manages memory (ie the programmer doesn't have to explicitly allocate and deallocate memory). The memory-management architecture is opaque (eg some Oot releases might use reference counting while others use tracing garbage collection) and provides no guarantees except to avoid crashes from invalid memory access in managed code, however, informally, Oot primarily aims to minimize long 'pauses'; and secondarily, to prefer less memory usage; both at the expenese of at the expense of lesser serial throughput (perhaps someday these choices will be configurable).

The initial implementation is 'stackless', in the sense that activation records (call stack frames) are allocated on the heap. This makes it simpler to implement closures (upvalues can survive after the exit of the function that contained them, if they are still referenced in a closure), continuations (it's easy to create references or copies of the state of the call stack, and to swap these in for later use, since there is no distinction between these and the 'real stack'), and large numbers of greenthreads (you don't have to allocate memory for the entire stack when creating a greenthread, nor do you have to bother with a special routine to check if the stack is almost full, and if so to expand it), at some cost to execution speed. Perhaps later the compile option will in some cases do something like Go's contiguous stacks as an optimization.

todo move a lot of this section to details

Overview of the module system

Oot modules are units of compilation, and units of naming scope. Each module is composed of one or more files. (todo: can you extend/override parts of someone else's module by adding a file in a different directory?)

By default, Oot compiles each module separately. When a module is changed, other modules don't need to be recompiled.

Type inference does not cross module boundaries.

Optionally, on some targets a program may be able to be 'globally compiled', meaning that all modules are compiled together, possibly presenting more opportunities for optimization.

Overview of the standard libraries, repository, packaging and deployment

Oot has four 'tiers' of libraries:

Among the canonical libraries are found libraries for:

There is a canonical repository where non-standard libraries may be found. The repository has a reputation-weighted voting mechanism to encourage the Oot community to coalesce around popular libraries, and to bring these to the attention of the Oot project for possible designation as 'canonical'.

 All 'core' libraries (and maybe all 'canonical' libraries; maybe excepting libraries that exist only to wrap external code, such as GUI framework wrappers) must have a 'pure Oot' version (ie a version that does not require linking to any external code), although they can have a default version that does use non-Oot code.

Overview of the type system

TODO

atomic types: int, float, str, bool, NIL, func, opaque 'nominal types'

attributes, like typeclasses/interfaces

Oot has optional static typing; variables which are not given a type are implicitly typed as 'dynamic', which means that type-checking will be done at runtime instead of at compile-time.

Oot is memory-safe; it's type system does not allow the programmer to do anything that could lead to a 'hard crash' (when the operating system terminates the program due to eg a segmentation fault); for example, Oot does not permit unsafe typecasts, such as casting an integer to a pointer. There are two exceptions, both of which require the person running the Oot compiler or interpreter to give special 'dangerous' compiler options, and so to be aware of what they are getting into; first, Oot programs are capable of calling library code written in other languages, and this other code may have bugs that lead to hard crashes; second, compiler options can turn off run-time typechecking for the sake of performance.

Oot provides type inference within modules (but not across module boundaries; public interfaces between modules must be explicitly typed).

Oot's type system is extensible; type extension libraries may alter the type rules used by Oot's typechecker.

'Everything is an interface'; the type signatures in Oot only care about which interfaces are provided by objects, not about how these objects are implemented. In Java, this would be like only caring about what interface an object was, not what class it is; in Haskell, this would be like only having typeclasses appearing in signatures, not concrete datatypes.

Overview of metaprogramming

Lisp is a homeoiconic language built upon lists. Oot is a near-homeoiconic language built upon labeled graphs.

Metaprogramming can be hard to read. Therefore, Oot provides a variety of metaprogramming construct so as to encourage programmers to use the LEAST powerful kind of metaprogramming that suits their needs.

The character '#' is often used in metaprogramming constructs. (todo ^?)

todo regions, eg transactions, custom evaluation strategies $custom, first-class environments, macros, reader macros

todo how to make scoping rules metaprogrammable, eg stuff like not having to prefix 'self' in oop

Labels

To create a label for an AST node, use '^@@LABEL'. To reference a label value, use ^@LABEL:

^@@POSITION1;
x = x + ^@@POSITION2 1;
target = ^@POSITION1;
goto target;

Labels must be unique within any given source code file.

Labels vs annotations vs keyword literals

KEYWORD_LITERALS, which are in uppercase, are literal values.

Annotations are not (base-level) nodes within the AST at all, but rather metadata attached to other AST nodes. They are 'notes in the margin' rather than residing in the 'main text'.

Like annotations, ^@LABELS are metadata attached to other AST nodes. They have the specific purpose of uniquely identifying a location in the AST. They must be unique per source-code file.

Various metaprogramming constructs are supposed to be 'hygenic' with respect to keyword literals and labels; that is, they are not supposed to match on keyword literal or label constants; they can perceive the presence of a keyword literal or label, and compare keyword literals and label to each other, but they can't recognize particular keyword literals or labels and do something special when they see them. By contrast, metaprogramming constructs can always match on annotations. (we saw 'supposed to be' rather than 'are' because clever metaprogramming can evade this restriction; so it's more of a community norm than a language-implementation-imposed requirement). Note that even 'hygenic' metaprogramming constructs may be 'passed in' particular keyword literals and labels to compare against; but this should be done separately in each source code file. The purpose of this restriction is to make it easy for the reader to see where black magic is occurring.

The relation between Oot and Oot Core

todo: this entire tutorial/introduction should probably cover Oot Core first, then present the rest of Oot as syntactic sugar for Oot Core.

Oot is built on a small core language called "Oot Core", by using its own metaprogramming facilities. Oot Core is a subset of Oot; any Oot Core code is valid Oot code (except that various extra-Core identifiers and operators which are considered to be part of the language Oot, may not be (re)defined in Oot, but may be defined/assigned to in Oot Core).

The rules of parsing for Oot Core and Oot are identical. The reference implementation of Oot is written in Oot Core. Therefore, any implementation of Oot Core is also sufficient to compile and execute Oot. However, for performance reasons, implementations of Oot may in addition choose to provide platform-optimized handling of various extra-Core constructs.

Oot adds various things to Oot core:

Some key ways in which Oot differs from other languages

todo: put this earlier



Continue with:


todo:

perhaps, if it's not already this way, move all grouping-related syntax into one section and say at the top of that section, "You could just explicitly group everything like Lisp, but that requires a lot of parentheses. Oot has some features to help you avoid writing so many parentheses:

precedence associativity commas implicit blocks (although note that i'm kind of souring on having unmatched parens at all)

also, mb a section on parens vs blocks

---

The default evaluation strategy is lenient, but in order to support infinite data structures, we allow the marking of datastructures as lazy. Should we also allow marking individual function arguments as lazy, or is this redundant with marking datastructures, or is it too complicated?

---