Universalis

Back

Definitions

  1. Strings (\(s\), \(t\), \(u\)): A lowercase English \(s\), \(t\), or \(u\) denotes a string.

  2. Characters (\(\mathfrak{a}\), \(\mathfrak{b}\), \(\mathfrak{c}\), etc. ): Lowercase Fraktur letters represent characters. Subscripts will occassionally be used in conjunction with Fraktur letters to denote characters at specific positions within strings, (\(\mathfrak{a}_1\), \(\mathfrak{a}_2\), … ).

  3. Empty Character (\(\varepsilon\)): The lowercase Greek letter epsilon, \(\varepsilon\), represents the empty Character.

  4. Delimiter (\(\sigma\)): The lowercase Greek letter sigma, \(\sigma\), represents the delimiter character, e.g. spaces.

  5. Words (\(a\), \(b\), \(c\), etc.): Lowercase English letters represent words. Subscripts will occassionally be used to denote words at specific positions within sentences, (\(a_1\), \(a_2\), … )

  6. Sentences (\(\zeta\)): The lowercase Greek letter zeta, \(\zeta\), represents sentences. Subscripts will occasionally be used to enumerate sentences in a language (\(\zeta_1\), \(\zeta_2\), …)

  7. Alphabets (\(\Sigma\)): The uppercase Greek letter sigma, \(\Sigma\), represents alphabets.

  8. Language (\(L\)): The uppercase English letter \(L\) represents a language.

  9. Corpus (\(C_L\)): The uppercase English letter \(C_L\) with a subscript \(L\) represents a corpus of sentences within a given language.

Alphabet

The aggregate of all characters is called an alphabet and is denoted by an uppercase sigma, \(\Sigma\),

\[\Sigma = \{ \varepsilon, \sigma, \mathfrak{a}, \mathfrak{b}, \mathfrak{c}, ... \}\]

Language

A language \(L\) is a set of strings constructed through concatenation on an alphabet \(\Sigma\) wherein each construction is assigned semantic content,

\[L = \{ a, b, c, ... \}\]

Corpus

A corpus \(C_L\) is a set of strings constructed by inserting a delimiter between words in language \(L\) and assigning semantic meaning,

\[C_L = \{ \zeta_1, \zeta_2, ... \}\]

Linguistic Hierarchy

  1. Strings: \(\iota\), \(a\), \(\zeta\)

  2. Sets: \(\Sigma\), \(L\), \(C_L\)

  3. Character Membership: \(\iota \in \Sigma\)

  4. Word Membership: \(a \in L\)

  5. Sentence Membership: \(\zeta \in C_L\)

To clarify the relationship between strings, characters, alphabets, words, languages, sentences and corpi in plain language,

  1. All characters, words and sentences are strings.

  2. All alphabets, languages and corpuses are sets of strings.

  3. All characters belong to an alphabet.

  4. All words belong to a language.

  5. All sentences belong to a corpus.

Character-level Set Representations

Let t be a string with characters \(\mathfrak{a}_i\). The character-level set representation of \(t\), denoted by uppercase letter \(T\), is defined as the ordered set of characters obtained by removing each empty character, \(\varepsilon\).

Example

Let a string be given by,

\[t = (\mathfrak{ab})(\varepsilon)(\mathfrak{c})\]

Then its character-level set representation is given by,

\[T = \{ (1, \mathfrak{a}), (2, \mathfrak{b}), (3, \mathfrak{c}) \}\]

String Length

Let \(t`\) be a string. Let T be the character-level set representation of \(t\). The string length of \(t\), denoted \(l(t)\), is the natural number which satisfies the following formula,

\[l(t) = \lvert T \rvert\]

Character Index Notation

Let \(t\) be a string with character-level representation \(T\),

\[T = (\mathfrak{a}_1, \mathfrak{a}_2, ..., \mathfrak{a}_{l(t)}).\]

Then for any \(i\) such that \(1 \leq i \leq l(t)\), \(t[i]\) is defined as \(\mathfrak{a}_i\), where \((i, \mathfrak{a}_i) \in T\).

Relations

Containment

Let \(t\) and \(u\) be Strings. \(t\) is said to be contained in \(u\), denoted by,

\[t \subset_{s} u\]

If and only if there exists a strictly increasing and consecutive function \(f: N_{l(t)} \to N_{l(u)}\) such that:

\[\forall i \in N_{l(t)}: t[i] = u[f(i)]\]

Operations

Concatenation

The result of concatenating any two characters \(\iota\) and \(\nu\) is denoted \(\iota\nu\). To make the operands of concatenation clear, parenthesis will sometimes be used to separate the characters being concatenated, e.g. \(\iota(\nu) = (\iota)\nu = (\iota)(\nu) = \iota\nu\). Character concatenation is defined inductively through the following schema,

  1. Basic Clause: \(\forall \iota \in \Sigma : \iota \varepsilon = \iota\)

  2. Inductive Clause: \(\forall \iota, \nu \in \Sigma : \forall s \in S: \iota(\nu s) = (\iota \nu)s\)

  3. Uniqueness Clause: \(\forall \iota, \nu, \omicron, \rho \in \Sigma : (\iota \nu = \omicron \rho) \to ((\iota = \omicron) \land (\nu = \rho))\)

  4. Comprehension Clause: \(\forall \iota \in \Sigma : \forall s \in S: \iota \in S\)

Inversion

\(t\) is called the inverse of \(s\) and is denoted \(\text{inv}(s)\) if it satisfies the following conditions,

  1. \(l(t) = l(s)\)

  2. \(\forall i \in N_{l(s)}: t[i] = s[l(s) - i + 1]\)

Reduction

A string reduction, \(\varsigma(s)\), is an operation that removes all delimiters from a string, but preserves the relative order of characters.

Example

\[\varsigma(\text{hello how are you}) = \text{hellohowareyou}\]

Note

Reduction and inversion are commutative,

\[\varsigma(\text{inv}(s)) = \text{inv}(\varsigma(s))\]

Axioms

Axiom C.0: The Equality Axiom

  1. \(\forall \iota \in \Sigma : \iota = \iota\)

  2. \(\forall \iota, \nu \in \Sigma : \iota = \nu \leftrightarrow \nu = \iota\)

  3. \(\forall \iota, \nu, \omicron \in \Sigma : (\iota = \nu \land \nu = \omicron) \to (\iota = \omicron)\)

Axiom C.1: The Character Axiom

\[\forall \iota \in \Sigma: \iota \in S\]

Axiom W.1: The Discovery Axiom

\[\forall a \in L: [ (l(a) \neq 0) \land (\forall i \in N_{l(a)}: a[i] \neq \sigma) ]\]

Axiom S.1: The Duality Axiom

\[( \forall a \in L: \exists \zeta \in C_{L}: a \subset_{s} \zeta ) \land ( \forall \zeta \in C_{L}: \exists a \in L: a \subset_{s} \zeta )\]

Axiom S.2: The Extraction Axiom

\[\forall \zeta \in C_{L} : \forall i \in N_{\Lambda(\zeta)}: \zeta\{i\} \in L\]

Axiom S.3: The Finite Axiom

\[\exists N \in \mathbb{N}: \forall \zeta \in C_L: l(\zeta) \leq N\]

Back