I am just writing down ideas. I have many pages of notes on things i may or may not want to put into Oot. Please note that my philosophy is, whenever i take notes for myself, if i don't want to keep them private, to put them on my website; these notes are therefore here, not because they are intended to be readable by anyone other than me, but just as notes to myself. Over time i hope to evolve them into something readable and after that i'll remove this disclaimer.
At this point, the design is unfinished; there is no coherent concept of a language called 'Oot', just a bunch of ideas towards such a proposal. There is no timeframe to actually implement this, and it most likely will never get finished.
To be clear, i repeat myself: there is no such thing as Oot. Oot has neither been designed nor implemented. These notes about 'Oot' contain many conflicting proposals.
This document is very out of date and no longer serves as a good introduction to the evolving language proposal.
You may want to look at proj-oot-whyOot.
There are a bunch of other files in this folder with more details and notes on various parts of Oot: proj-oot.
update: i'm starting to develop an OVM (Oot Virtual Machine) implementation. This is not Oot, but it is the language that Oot will be implemented in (for portability), and it will also be the IL to which Oot programs are compiled. See https://gitlab.com/oot/ovm
Oot is a readable, massively concurrent, programmable programming language.
Oot is a general-purpose language that particularly targets the following application domains:
TODO: the above lists, while detailed and accurate, are maybe a little too long and too wordy; compared to the list in the OLD section below
Why not Oot?
Oot aspires to combine the readability and ease of Python, the functional laziness of Haskell, the commandline suitability of Perl, the straighforwardness of C, the metaprogrammability of Lisp, and the simplicity of BASIC (well, not quite), with massive concurrency.
TODO: unify this with whyOot.txt:Intro
print! "Hello world!"
In this section we briefly explain the basics of Oot. Our purpose is to give you a sense of Oot, and also the background you'll need for understanding the examples in the Tour.
TODO this is much too detailed, move a lot of this into 'details'
If you are coming from another language, you may be surprised by the following aspects of Oot's syntax and builtin functions:
Unlike e.g. Python, the AMOUNT of whitespace does not matter in Oot; but the PRESENCE or ABSENCE of whitespace, and the TYPE of whitespace (whether it is a space, a newline, or a blank line) does matter.
Only punctuation affects the way things are parsed. There are no alphanumeric 'reserved words' except for the 26 one-letter uppercase macros (eg 'A' is infix boolean 'and').
A person reading code does not have to look up any function or macro definitions to see how things will parse (except of course within metaprogramming constructs that essentially pass a string to something like a custom 'eval').
The general idea is to start with a hypothetical homeoiconic syntax, but (in comparison to Lisp) built upon associative arrays/labeled trees instead of Lisp's lists/s-exprs/unlabeled trees. Then, add syntax for convenience, but the syntax is generic rather than specific to particular language constructs. Symbols may have more than one meaning depending on context, but only when these meanings are conceptually related (to aid memory).
TODO (see each of the two, somewhat conflicting, sections below for now:
note that within data constructors, the lhs of = and / can be dynamically computed at runtime by using parentheses (entering code context) TODO MOVE TO DETAILS
todo within data constructors, ["" means 'the items in here are all quoted', eg [""a b c] == ["a" "b" "c"]; except that as a special case, both [""] and ["" ] are equivalent to [ "" ] (a list containing only the empty string).
Semicolons denote a sequencing relationship between the things they delimit.
Semicolons are often implicitly added at newlines, but any Oot program can be written as a 'one-liner' by using explicit semicolons.
To delimit a line without denoting a sequencing relationship (this also hints to the Oot implementation that it may be worth it to parallelize the evaluation of the two things broken up in this way), use colon-semicolon, ':;', which you can think of as a semicolon with a modifier.
todo: in 'a ; b :; c ; d', does ; or :; have precedence, that is, is this like a ; (b :; c) ; d, or like (a ; b) :; (c ; d)?
Comments are started by two or more adjacent semicolons. If the semicolons are followed by whitespace, the comments run to the end of the current line. Otherwise, the semicolons and the whitespace characters following the semicolons together form a custom delimited that will delimit the end of the comment.
print 1+1 ;; nice day today
print 1+1 ;;;;; yay!
print 1+ ;;xyz one ;;xyz 1
print 1+ ;;xyz man this is
quite a long
comment
;;xyz 1
Identifiers can consist of alphanumeric characters and dashes, except that they cannot start with a dash, and if they end with a dash then that has a special meaning. The functions of a string of alphanumeric characters and dashes varies depending on case:
Identifiers that end with one dash are 'private' to the module they are defined in. Identifiers that end with two dashes are reserved for use by the Oot language.
Grouping is supplied by parenthesis (), blocks {}, graph constructors [], semicolons ;, and colons :.
Blocks are constructor for first-class (lists of) expressions.
The precedence of infix operators can be determined by the characters in them (eg you don't have to look up the function definition of infix operators in order to determine how they parse).
todo: mb this should be ',,' ',,,' etc; they can still be multidim when in data context; then ':' can be left for type annotation; although we could use '::' for that instead
A suffix colon begins a (possibly multiline) parenthetical group; a prefix colon identifies a keyword, and creates an implicit block as its argument.
For example:
if: condition
:then
first
:else
second
Infix colons create implicit parentheses on both the lefthand and the righthand of the colon. Multiple infix colons do the same thing, but have lower precedence the more of them there are. You can think of infix colons as 'pushing away' everything on the left from everything on the right.
For example,
a b : c d
is equivalent to
(a b) (c b);
Another example:
a b :: c d : e f g h : i j
is equivalent to:
(a b) ((c d) (e f)); (g h) (i j);
Infix comma turns the current grouping into an implicit data constructor:
3 , 5 == [3 5]
To turn a function into an infix operator, surround it with <>s:
div 10 5 == 2 == 10 <div> 5
The thing in the middle of the <>s must be an identifer or literal, not some more complicated expression. And the <>s must surround it without spaces in between. (this is not ambiguous with usage of < and > as less-than, greater-than, because less-than/greater-than are non-associative and so must be explicitly parenthesized if they appear in the same grouping; eg 0 < 5 > 2 is illegal because it is ambiguous between (0 < 5) > 2 and 0 < (5 > 2))
To turns an infix operator into a non-infix (prefix) function, surround it with parentheses:
5 + 3 == 8 == (+) 5 3 addfn = (+) add_one_to_all = map (+)
There is also something called a 'region', marked by double braces: {{}}. Regions are distinguished by an associated value, which immediately precedes the region opening and immediately succeeds the region closing. For example, the following region is associated with the value '3':
v = 3
v{{print "Hi"}}v
Regions do NOT affect parsing or grouping but are rather a method for marking locations in code or data with boundary annotations.
Any Oot program can be expressed in a single line. Newlines usually represent implicit semicolons (except when the line ends with a '\', or when all parentheses have not yet been closed). A region of text bounded by square brackets or blank lines is called a 'paragraph'. Paragraphs implicitly close open parentheses. Eg:
print 1 + 1 print (1 + 1) print (1 + 1 print 2
is equivalent to:
print 1 + 1; print (1 + 1); print (1 + 1); print 2;
In Oot, it sometimes matters whether an operator is separated from its argument by a space (freestanding), or whether it is 'attached', and if it is attached, whether is appears on the left or the right side of its argument. So there are four cases, each of which may mean different things:
Often prefix versions of operators are something like a 'constructor', and the suffix version is the corresponding 'destructor'.
todo: should infix attached and unattached be the same except for grouping and mixed case? i think so
make a reference:
y = {x}
dereference: !y
Everything is an expression. The last line of a function is its (primary) return value. The last line of a branch within an 'if' statement is its return value.
To apply the function 'add' to 1 and 'x' and store the result in 'result':
result = add 1 x
(function application is by juxtaposition)
Partial function application is achieved just by not giving all required arguments:
increment = add 1 result = increment x result == increment x == (add 1) x
(functions are curried and left-associative)
Functions can have keyword arguments, which are bound using '/':
result = bake-n-pies 3 KIND/"pumpkin"
The keywords in keyword arguments (ie the things on the left side of the '/', eg 'kind' above) can be written in lowercase, in which case they will be autocapitalized, eg:
result = bake-n-pies 3 kind/"pumpkin"
If you want to dynamically compute the keyword at runtime, put parentheses around it, eg:
result = bake-n-pies 3 (keyword-from-str "KIND")/"pumpkin"
Keyword arguments are always optional, and optional arguments are always keyword arguments. When a function is partially applied, any keyword arguments not explictly given are implicitly assigned to their defaults; they do not remain open for later assignment; however, this behavior can be overridden by using the $/ operator (todo: sure about this? or should the defaults happen when all of the position arguments are used up?):
g = f x $/
A function can directly access the args it has been passed as 'args--'. To access the args from a higher lexical scope, TODO ('..'? more dashs?)
x = 3 x = 4
Note that in Oot, '=' does not behave like it does in mathematics; in mathematics, if x equals something, then it is always equal to that thing and can't change to be not equal to it later, but in Oot, x is not a statement of equality but rather an action (assignment), and what is assigned to a given variable can change over time.
Similar syntax to variable assignment, except use ':=' instead of '=':
f x y := x + y + 3
f x y := {k = 3; x + y + k}
f 1 2 == 6
Functions with multiple return values (and optional return values) are created using a 'double assignment' syntax. Calling the function normally gives only the first return value. To get the other return value(s), use an assignment with '/'s attached to some of the variables to indicate which arguments they capture (the same syntax as giving keyword arguments to functions). Since the optional return arguments must be assigned via keyword, multiple return arguments can appear in any order:
z k = f x y := {k = 3; z = x + y + k}
main_result = f 1 2
main_result == 6
main_result K/secondary_result = f 1 2
main_result == 6
secondary_result == 3
K/secondary_result main_result = f 1 2
main_result == 6
secondary_result == 3
Note that commas on the LHS create an implicit data constructor, which destructures a single return value, rather than separating multiple return values:
z k = g x y := {k = 3; z = [x y]}
main_result = g 1 2
main_result == [1 2]
main_result K/secondary_result = g 1 2
main_result == [1 2]
secondary_result == 3
[x y] K/k = g 1 2
x == 1
y == 2
k == 3
[x y] = g 1 2
x == 1
y == 2
To cause a function to return all of its potentially multiple arguments as a graph, use a postfix asterisk ('*') on the function call:
z k = g x y := {k = 3; z = [x y]}
result = g* 1 2
result == [[1 2] K/3]
[[x y] K/k] = g* 1 2
x == 1
y == 2
k == 3
[x y], K/k = g* 1 2
x == 1
y == 2
k == 3
[x y] K/k = g* 1 2 ;; ERROR; although g returns two values, g* only returns one (unlabeled) value, yet here we erroneously attempt to extract a second value labeled "K"
todo
Unary (unary prefix):
! logical NOT - unary minus (arithmetic negation)
Binary:
On numbers, +, -, *, <, <=, >, >= have their usual arithmetic meanings. In addition:
& logical AND | logical OR
Trinary:
}}}
==== Custom operators ====
To define a custom operator, first define an ordinary function, then assign it to the operator, eg:
{{{
f x y = x + y
+<< = f
Custom operators must be composed entirely of punctuation characters.
Unary custom operators must begin with '!'
Binary custom operators must begin with one of:
*+>%&|-
Any binary operator beginning with * or + is assumed to be associative with itself. Other operators do not associate with themselves and must be explicitly parenthesized if they are repeated.
Ternary custom operators must begin with '<'
Unary custom operators are used by attaching them as a prefix to their argument, eg:
Binary custom operators are used freestanding or infix. They have three precedence levels:
=<>%&|Ternary custom operators are used by enclosing the second argument with the operator and its mirror image, with no intervening spaces, and surrounding that with arguments arg1 and arg3 (todo: better explanation here). Eg:
f x y z = (x - y) * (z - y) <+ = f 5 <+1+> 3 == (5 - 1) * (3 - 1) == 8
To aid readability, the middle argument to a ternary operator can only be a single identifer or a literal, not a more complicated expression:
5 <+(0+1)+> 3 ;; SYNTAX ERROR; middle argument to ternary must be single identifier or literal
After the initial character, a custom operator may contain one or more characters which are any of:
*+<>%*&|~@#$^?
(todo can they really contain any of those characters?)
Note that binary custom operators may not contain '!' or '-' after the first character, to allow things like arg1*-arg2 to be easily and unambiguously parsed as arg1*(-arg2).
Note: there is one unary operator consisting of a single '-' character, unary minus, eg "-3"; this is a special case; additional custom unary operators cannot be defined which start with '-', because that would make it difficult for the reader to tell at a glance if the operator was unary or binary.
There may be more than one unary operator applied to the same alphanumeric base. Unary operators associate to the right with any other unary operators.
Note that although function application binds tighter than any freestanding infix, attached infix binds tighter than function application, eg "f x + 2" is "(f x) + 2" but "f x+2" is "f (x+2)", because attached infix is as if the attached group was surrounded by parentheses. Within an attached infix grouping, the same rules of precedence hold.
To the extent that the order of operations is not determined by these rules, expressions must have explicit parentheses to resolve the remaining ambiguity (it is a syntax error not to).
^# as a statement of it's own imports a module directly into its containing namespace:
^#math cos 0 == 1
By addressing into ^#, you can import individual items from the module namespace:
^#math.cos cos 0 == 1
todo: should 'cos' be in quotes
Or multiple names at once:
^#math.[cos sin] cos 0 == 1 sin 0 == 0
^## imports a module and assigns it to a variable whose name is the module name:
^##math math.cos 0 == 1
^# as part of an expression evaluated to the imported module, and can be used to eg assign the module to a different name:
mymath = ^#math
Names within a module may also be accessed via ^# within an expression:
mycos = ^#math.cos
Or multiple names at once:
mycos, mysin = ^#math.[cos sin]
^#! is 'raw' import, which means just including the text of the destination file at this point within this file as if it were typed here:
if filename.txt contains "x" then
x = "Hello World!"; P (%#! filename.txt)
is equivalent to:
x = "Hello World!"; P (x)
Note: it is a syntax error to have unbalanced grouping in either the containing file or the raw imported file; eg you cannot have "{x = %#! filename.txt" in the containing file and "1}" in filename.txt
All ^# and ^#! imports are compile time by default. To do a run-time import, use ^#$ and ^#!$ instead. Most run-time import metaprogramming does not affect the containing code (the code outside of the modules/files which are being imported).
In order to make it easier to quickly skim an unfamiliar function, footnotes provide a mechanism to separate the exposition of the 'main idea' of a piece of code from 'details' (for example, error handling). Example:
x = 10 y = if (errorCondition): handleError else: doSomething x print ('y = ' + str(y))
can also be written as:
x = 10 y = doSomething x #1 print ('y = ' + str(y))
When footnotes are used, the intended implication is that the reader is encouraged to first read the block of code without bothering to consult the footnotes, to get an idea of what is generally going on, and only afterwards to delve into the footnotes. Footnotes which are not just 'details' but which rather change the 'basic idea' of what is going are considered to be an abuse of this facility; ordinary macros should be used for this instead.
todo
This is used in graph constructors, and in pattern matching.
Blocks are suspensions (they are 'run' with '!').
TODO (some applications: 0-ary and keyword)
TODO apply--
Although we emphasize immutable data and eschew unexpected side-effects, we permit 'local mutation', for example, mutable variable assignment, and creation of new values by operations that look syntactically like 'mutating' old values. This is because these can be converted, through simple syntactic transformations that are local to one function, into purely functional code.
The values themselves are immutable (except for values stored in 'reference variables', todo what's the syntax/sigil for those?). Therefore these 'mutations' have no side-effects aside from the rebinding of the indicated variable to a new value; in particular, mutating a parameter passed into a function has no effect outside of that function, and if you do "a = b", then the value of b is copied into a, and mutations to 'b' will have no further effect on 'a'.
What is dangerous is not these local violations of referential transparency, but global ones; for example, hidden aliasing between variables, especially across function boundaries.
The '=' sign is used for assignment. When a simple assignment is made (a simple assignment is when the lhs (left hand side) consists of just a variable name), the variable is rebound to the value on the rhs (right hand side).
If a composite assignment is made, one of the form "f x", where f is any expression, then a new function value 'g' is created which is like f, except that on input x, the value returned by 'g x' is the rhs; then this new function is assigned to f (as if "f = g" were written; if 'f' is itself composite, this causes a recursion).
eg:
x = 3 x = 4 x == 4
x = 3 x = 4 f = [1 2 3] f 1 = 5 f == [1 5 3] f2 = [10 [1 2 3] 30] f2 1 1 = 5 ;; first, (f2 1) 1 = 5, so a g is created such that g = (f2 1) except that g 1 = 5; then we do (f2 1) = g, and recurse f2 == [10 [1 5 3] 30]
TODO , if the class of the value on the lhs (left hand side) of the assignment provides a function named "set--", then this is called. TODO: "the class of"?
Two equals signs, '==', is the boolean 'is' function as a binary comparison operator (todo).
Three equals signs, '===', is structural equality.
Four equals signs, '====', is pointer equality.
Think of is a shorthand for '-->'. It comes up in constructs like 'cond':
For example:
i = 1 j = 2 cond: i 0 // "zero" 1 // "one" j // "two"
or
i = 1 j = 2 cond: i i == j // "equal" i -= j // "not equal"
It also comes up in other situations where there is a list of rules, with the part to the left of the is the condition under which the rule applies, and the part to the right of the is what is true (or what is should be executed) when the condition is true.
Because of their centrality to Oot, graph constructors have already been covered above.
Atomic literals consist of ints (3), floats (3.0), strings ("hi"), booleans (TRUE), and NIL.
Some Oot implementations may support Unicode in source files. In this case, Unicode is permitted only within string literals.
String literals by default support interpolation with '$' and character escaping with '\'; within an ordinary string literal, '$' and '\' and '"' must be themselves escaped by '\'. There are also 'raw string literals', of the form r"". Raw string literals do not support escaping, and so cannot contain the character '"'. (todo: should we instead do what Python does [1]