Acyclicity and antisymmetryWe perform a (not so successful) experiment that proves a simple graph-theoretic property (an induced sub-graph of a directed acyclic graph is also acyclic) in relational calculus.— 0019 · 5 Apr ’21
I am interested in the discovery of mathematical and logical structures of computation so that they can be manifested in programming languages and effectively guide the programmer. I approach problems mainly from the perspective of intuitionistic type theory, which substantially overlaps with functional programming. For theoretic modelling, my weapon of choice is Agda; for practical programming, I use Haskell.
I have been exploring the potential of dependently typed programming, where sophisticated correctness properties and proofs can be integrated into types and programs, and mechanically verified by type-checking. More interestingly, with an interactive development environment (like the Emacs mode offered by Agda) that provides semantic hints in the form of type information, the programmer can gain better ‘situation awareness’ regarding the meaning of the program being constructed. (See, for example, my paper on programming metamorphic algorithms.) This is made possible primarily because of inductive families, and my DPhil thesis developed datatype-generic techniques for improving usability and reusability of inductive families. I also studied the Algebra of Programming (a.k.a. Bird–Meertens formalism or program calculation) and contributed to its dependently typed formalisation.
I have also worked on bidirectional transformations for solving consistency maintenance (synchronisation) problems. More specifically, my focus was on bidirectional programming, which is largely synonymous with the construction of lenses. My main results in this area were built around a small ‘putback-based’ bidirectional programming language BiGUL (Bidirectional Generic Update Language), about which there were an Agda formalisation, a Haskell port, and an axiomatic semantics that helps to explain how to think about bidirectional programs unidirectionally.
EmploymentStarting from the end of 2019 I am an assistant research fellow (which is equivalent to the rank of assistant professor) at the Institute of Information Science (IIS), Academia Sinica, Taiwan, and a member of the Programming Languages and Formal Methods Group within IIS. Previously I was a postdoctoral researcher under Zhenjiang Hu at the National Institute of Informatics, Japan during 2014–2019. I also worked as a research assistant of Shin-Cheng Mu at IIS during 2007–2009 (part-time) and briefly in 2010 (full-time).
EducationI was born and grew up in Changhua, Taiwan. I did my BS in Computer Science and Information Engineering with a minor in Mathematics at National Taiwan University, Taiwan during 2005–2009. After finishing my one-year compulsory substitute service, I went to the University of Oxford, UK in 2010 for a DPhil in Computer Science supervised by Jeremy Gibbons, and was admitted to the degree in 2014.
My Chinese name 向上 (Hsiang-Shang), which I publish under, means ‘upwards’ and is an unusual one; sadly, it’s impossible to approximate its pronunciation acceptably accurately in English. On the other hand, ‘Josh’ is a name I’ve used for years (since 1996, to be precise), so I am perfectly happy (in fact happier) to be called by the name in English. Even in a Chinese-speaking environment, using an English name can be helpful in avoiding the strictly hierarchical and somewhat distant way of addressing people in East Asia and thus breaking one of the mental barriers to equality.
I listen to classical music (mainly from the Romantic period) and play the piano — I own a Yamaha AvantGrand N1X and strive to practise daily. For piano music, I feel closest to Chopin, and I’m getting to know Beethoven and Schubert. More broadly, I also admire Brahms’s and Tchaikovsky’s work, and start to appreciate Mahler’s symphonies and even a bit of Stravinsky.
I’m attracted to technical and operational aspects of civil aviation; I collect 1/200 and 1/400 airliner models, and I’m learning to fly the Boeing 777 in X-Plane.
There have been investigations into type-theoretic foundations for metaprogramming, notably Davies and Pfenning’s (2001) treatment in S4 modal logic, where code evaluating to values of type A is given the modal type
Code A (
□A in the original paper). Recently Kavvos (2017) extended PCF with
Code A and intensional recursion, understood as the deductive form of the GL (Gödel-Löb) axiom in provability logic, but the resulting type system is logically inconsistent. Inspired by staged computation, we observe that a term of type
Code A is, in general, code to be evaluated in a next stage, whereas S4 modal type theory is a special case where code can be evaluated in the current stage, and the two types of code should be discriminated. Consequently, we use two separate modalities
□ to model S4 and GL respectively in a unified categorical framework while retaining logical consistency. Following Kavvos’ (2017) novel approach to the semantics of intensionality, we interpret the two modalities in the 𝒫-category of assemblies and trackable maps. For the GL modality
□ in particular, we use guarded type theory to articulate what it means by a ‘next’ stage and to model intensional recursion by guarded recursion together with Kleene’s second recursion theorem. Besides validating the S4 and GL axioms, our model better captures the essence of intensionality by refuting congruence, i.e. two extensionally equal terms may not be intensionally equal, and the generic internal quoting A →
□A as well as A →
⊠A. Our results are developed in (guarded) homotopy type theory and formalised in Agda.
Bidirectional transformations (bx) are mechanisms for maintaining the consistency of multiple artefacts. Some of the challenges bx research aims to address include answering fundamental questions such as how best to precisely characterise consistency, “good” consistency maintainers, and the required input and assumptions to guarantee this good behaviour.
While substantial progress has been made towards unifying the different (variants of) formal frameworks for bx, many of these formal results are not yet easily accessible to more practically-oriented bx researchers, often due to a missing background in (advanced) category theory. This is an unfortunate situation, as many bx tools are developed and maintained by practically-oriented bx researchers, who would but currently cannot fully profit from formal results and insights. In particular, we are not aware of any practical implementation of C-lenses, not even preliminary discussions about its relevance to practical bx. We believe this is because of the inaccessibility of the categorical language, and write this paper to decipher C-lenses for bx researchers without any substantial background in category theory. Our goal is to spark broader interest in C-lenses and in discussions on their usefulness in practice and potential for improving existing, and inspiring new bx tools.
We start by reviewing the well-known state-based bx framework, already accessible to a broad audience due to its simplicity. We then explain how this framework can be elegantly generalised to the richer, delta-based setting of C-lenses using multiple examples and illustrative diagrams.
2.6.1 with Standard Library version 1.3
In dependently typed programming, proofs of basic, structural properties can be embedded implicitly into programs and do not need to be written explicitly. Besides saving the effort of writing separate proofs, a most distinguishing and fascinating aspect of dependently typed programming is that it makes the idea of interactive type-driven development much more powerful, where expressive type information becomes useful hints that help the programmer to complete a program. There have not been many attempts at exploiting the full potential of the idea, though. As a departure from the usual properties dealt with in dependently typed programming, and as a demonstration that the idea of interactive type-driven development has more potential to be discovered, we conduct an experiment in ‘type-driven algorithm design’: we develop algorithms from their specifications encoded in sophisticated types, to see how useful the hints provided by a type-aware interactive development environment can be. The algorithmic problem we choose is metamorphisms, whose definitional behaviour is consuming a data structure to compute an intermediate value and then producing a codata structure from that value, but there are other ways to compute metamorphisms. We develop Gibbons’s streaming algorithm and Nakano’s jigsaw model in the interactive development environment provided by the dependently typed language Agda, turning intuitive ideas about these algorithms into formal conditions and programs that are correct by construction.
This paper introduces the design of a snapshot-consistent flash translation layer (SCFTL) for flash disks, which has a stronger guarantee about the possible behavior after a crash than conventional designs. More specifically, the flush operation of SCFTL also has the functionality of making a “disk snapshot.” When a crash occurs, the flash disk is guaranteed to recover to the state right before the last flush. The major benefit of SCFTL is that it allows a more efficient design of upper layers in the storage stack. For example, the file system built on SCFTL does not require the use of a journal for crash recovery. Instead, it only needs to perform a flush operation of SCFTL at the end of each atomic transaction. We use a combination of a proof assistant, a symbolic executor, and an SMT solver, to formally verify the correctness of our SCFTL implementation. We modify the xv6 file system to support group commit and utilize SCFTL’s stronger crash guarantee. Our evaluation using file system benchmarks shows that the modified xv6 on SCFTL is 3 to 30 times faster than xv6 with logging on conventional FTLs, and is in the worst case only two times slower than the state-of-the-art setting: the ext4 file system on the Physical Block Device (pblk) FTL.
Language designers usually need to implement parsers and printers. Despite being two closely related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in a single program, with their consistency guaranteed automatically. Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can propagate the modification to the program text while preserving layouts, comments, and syntactic sugar.
To address these needs, we propose a domain-specific language BiYacc, whose programs denote both a parser and a reflective printer for a fully disambiguated context-free grammar. BiYacc is based on the theory of bidirectional transformations, which helps to guarantee by construction that the generated pairs of parsers and reflective printers are consistent. Handling grammatical ambiguity is particularly challenging: we propose an approach based on generalised parsing and disambiguation filters, which produce all the parse results and (try to) select the only correct one in the parsing direction; the filters are carefully bidirectionalised so that they also work in the printing direction and do not break the consistency between the parsers and reflective printers. We show that BiYacc is capable of facilitating many tasks such as Pombrio and Krishnamurthi’s ‘resugaring’, simple refactoring, and language evolution.
Bidirectional transformations (bx) are relevant for a wide range of application domains. While bx problems may be solved with unidirectional languages and tools, maintaining separate implementations of forward and backward synchronizers with mutually consistent behavior can be difficult, laborious, and error-prone. To address the challenges involved in handling bx problems, dedicated languages and tools for bx have been developed. Due to their heterogeneity, however, the numerous and diverse approaches to bx are difficult to compare, with the consequence that fundamental differences and similarities are not yet well understood. This motivates the need for suitable benchmarks that facilitate the comparison of bx approaches. This paper provides a comprehensive treatment of benchmarking bx, covering theory, implementation, application, and assessment. At the level of theory, we introduce a conceptual framework that defines and classifies architectures of bx tools. At the level of implementation, we describe Benchmarx, an infrastructure for benchmarking bx tools which is based on the conceptual framework. At the level of application, we report on a wide variety of solutions to the well-known Families-to-Persons benchmark, which were developed and compared with the help of Benchmarx. At the level of assessment, we reflect on the usefulness of the Benchmarx approach to benchmarking bx, based on the experiences gained from the Families-to-Persons benchmark.
Based on Foster et al.’s lenses, various bidirectional programming languages and systems have been developed for helping the user to write correct data synchronisers. The two well-behavedness laws of lenses, namely Correctness and Hippocraticness, are usually adopted as the guarantee of these systems. While lenses are designed to retain information in the source when the view is modified, well-behavedness says very little about the retaining of information: Hippocraticness only requires that the source be unchanged if the view is not modified, and nothing about information retention is guaranteed when the view is changed. To address the problem, we propose an extension of the original lenses, called retentive lenses, which satisfy a new Retentiveness law guaranteeing that if parts of the view are unchanged, then the corresponding parts of the source are retained as well. As a concrete example of retentive lenses, we present a domain-specific language for writing tree transformations; we prove that the pair of get and put functions generated from a program in our DSL forms a retentive lens. We demonstrate the practical use of retentive lenses and the DSL by presenting case studies on code refactoring, Pombrio and Krishnamurthi’s resugaring, and XML synchronisation.
Session types are a type discipline for eliminating communication errors in concurrent computing. These types can be thought of as a representation of communication protocols implemented by communicating processes. One application scenario that can be naturally supported by session types is semantics-preserving transformation of processes in response to protocol changes due to optimization, evolution, refactoring, etc. Such transformation can be seen as a particular kind of synchronization problem that has long been studied by the bidirectional transformations (BX) community. This short paper offers a preliminary analysis of the process–type synchronization problem in terms of BX, describing the prospects and challenges.
Languages for programming state-based asymmetric lenses are usually based on lens combinators, whose style, having a functional programming origin, is alien to most programmers and awkward to use even for experienced functional programmers. We propose a visual syntax mimicking circuit diagrams for the combinator-based language BiGUL, provide a relational interpretation that allows the diagrams to be understood bidirectionally, and sketch how an editor for the visual syntax can help to construct, understand, and debug lens combinator programs in an intuitive and friendly way.
Putback-based bidirectional programming allows the programmer to write only one putback transformation, from which the unique corresponding forward transformation is derived for free. A key distinguishing feature of putback-based bidirectional programming is full control over the bidirectional behavior, which is important for specifying intended bidirectional transformations without any ambiguity. In this chapter, we will introduce BiGUL, a simple yet powerful putback-based bidirectional programming language, explaining the underlying principles and showing how various kinds of bidirectional application can be developed in BiGUL.
2.5.2 with Standard Library version 0.13
Among the frameworks of bidirectional transformations proposed for addressing various synchronisation (consistency maintenance) problems, Foster et al.’s  asymmetric lenses have influenced the design of a generation of bidirectional programming languages. Most of these languages are based on a declarative programming model, and only allow the programmer to describe a consistency specification with ad hoc and/or awkward control over the consistency restoration behaviour. However, synchronisation problems are diverse and require vastly different consistency restoration strategies, and to cope with the diversity, the programmer must have the ability to fully control and reason about the consistency restoration behaviour. The putback-based approach to bidirectional programming aims to provide exactly this ability, and this paper strengthens the putback-based position by proposing the first fully fledged reasoning framework for a bidirectional language — a Hoare-style logic for Ko et al.’s  putback-based language BiGUL. The Hoare-style logic lets the BiGUL programmer precisely characterise the bidirectional behaviour of their programs by reasoning solely in the putback direction, thereby offering a unidirectional programming abstraction that is reasonably straightforward to work with and yet provides full control not achieved by previous approaches. The theory has been formalised and checked in Agda, but this paper presents the Hoare-style logic in a semi-formal way to make it easily understood and usable by the working BiGUL programmer.
Pregel is a popular distributed computing model for dealing with large-scale graphs. However, it can be tricky to implement graph algorithms correctly and efficiently in Pregel’s vertex-centric model, especially when the algorithm has multiple computation stages, complicated data dependencies, or even communication over dynamic internal data structures. Some domain-specific languages (DSLs) have been proposed to provide more intuitive ways to implement graph algorithms, but due to the lack of support for remote access — reading or writing attributes of other vertices through references — they cannot handle the above mentioned dynamic communication, causing a class of Pregel algorithms with fast convergence impossible to implement.
To address this problem, we design and implement Palgol, a more declarative and powerful DSL which supports remote access. In particular, programmers can use a more declarative syntax called chain access to naturally specify dynamic communication as if directly reading data on arbitrary remote vertices. By analyzing the logic patterns of chain access, we provide a novel algorithm for compiling Palgol programs to efficient Pregel code. We demonstrate the power of Palgol by using it to implement several practical Pregel algorithms, and the evaluation result shows that the efficiency of Palgol is comparable with that of hand-written code.
This paper has been superseded by an extended version in Software and Systems Modelling.
Bidirectional transformation (bx) approaches provide a systematic way of specifying, restoring, and maintaining the consistency of related models. The current diversity of bx approaches is certainly beneficial, but it also poses challenges, especially when it comes to comparing the different approaches and corresponding bx tools that implement them. Although a benchmark for bx (referred to as a benchmarx) has been identified in the community as an important and currently still missing contribution, only a rather abstract description and characterisation of what a benchmarx should be has been published to date. In this paper, therefore, we focus on providing a practical and pragmatic framework, on which future concrete benchmarx can be built. To demonstrate its feasibility, we present a first non-trivial benchmarx based on a well-known example, and use it to compare and evaluate three bx tools, chosen to cover the broad spectrum of bx approaches.
220.127.116.11 with Standard Library version 0.12
Dependently typed programming advocates the use of various indexed versions of the same shape of data, but the formal relationship amongst these structurally similar datatypes usually needs to be established manually and tediously. Ornaments have been proposed as a formal mechanism to manage the relationships between such datatype variants. In this paper, we conduct a case study under an ornament framework; the case study concerns programming binomial heaps and their operations — including insertion and minimum extraction — by viewing them as lifted versions of binary numbers and numeric operations. We show how current dependently typed programming technology can lead to a clean treatment of the binomial heap constraints when implementing heap operations. We also identify some gaps between the current technology and an ideal dependently typed programming language that we would wish to have for our development.
This paper has been superseded by an extended version in New Generation Computing.
Language designers usually need to implement parsers and printers. Despite being two intimately related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in one single program, with their consistency guaranteed automatically.
Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can reflect the modification to the program text while preserving layouts, comments, and syntactic sugar.
To address these needs, we propose a domain-specific language BiYacc, whose programs denote both a parser and a reflective printer for an unambiguous context-free grammar. BiYacc is based on the theory of bidirectional transformations, which helps to guarantee by construction that the pairs of parsers and reflective printers generated by BiYacc are consistent. We show that BiYacc is capable of facilitating many tasks such as Pombrio and Krishnamurthi’s ‘resugaring’, language evolution, and refactoring.
18.104.22.168 with Standard Library version 0.11
The version of BiGUL described in this paper is outdated. See the ‘axiomatic basis’ paper for a semi-formal axiomatic introduction to the current BiGUL.
Putback-based bidirectional programming allows the programmer to write only one putback transformation, from which the unique corresponding forward transformation is derived for free. The logic of a putback transformation is more sophisticated than that of a forward transformation and does not always give rise to well-behaved bidirectional programs; this calls for more robust language design to support development of well-behaved putback transformations. In this paper, we design and implement a concise core language BiGUL for putback-based bidirectional programming to serve as a foundation for higher-level putback-based languages. BiGUL is completely formally verified in the dependently typed programming language Agda to guarantee that any putback transformation written in BiGUL is well-behaved.
the more complete SLE’16 paper, which, in particular, describes the internals of BiYacc in much more detail.
Language designers usually need to implement parsers and printers. Despite being two related programs, in practice they are designed and implemented separately. This approach has an obvious disadvantage: as a language evolves, both its parser and printer need to be separately revised and kept synchronised. Such tasks are routine but complicated and error-prone. To facilitate these tasks, we propose a language called BiYacc, whose programs denote both a parser and a printer. In essence, BiYacc is a domain-specific language for writing putback-based bidirectional transformations — the printer is a putback transformation, and the parser is the corresponding get transformation. The pairs of parsers and printers generated by BiYacc are thus always guaranteed to satisfy the usual round-trip properties. The highlight that distinguishes this reflective printer from others is that the printer — being a putback transformation — accepts not only an abstract syntax tree but also a string, and produces an updated string consistent with the given abstract syntax tree. We can thus make use of the additional input string, with mechanisms such as simultaneous pattern matching on the view and the source, to provide users with full control over the printing-strategies.
2.5.2 with Standard Library version 0.13
The revised version uses https DOI links and fixes identifier hyperlinks and a few typos in the Bodleian version.
Based on a natural unification of logic and computation, Martin-Löf’s intuitionistic type theory can be regarded simultaneously as a computationally meaningful higher-order logic system and an expressively typed functional programming language, in which proofs and programs are treated as the same entities. Two modes of programming can then be distinguished: in externalism, we construct a program separately from its correctness proof with respect to a given specification, whereas in internalism, we encode the specification in a sophisticated type such that any program inhabiting the type also encodes a correctness proof, and we can use type information as a guidance on program construction. Internalism is particularly effective in the presence of inductive families, whose design can have a strong influence on program structure. Techniques and mechanisms for facilitating internalist programming are still lacking, however.
This dissertation proposes that internalist programming can be facilitated by exploiting an interconnection between internalism and externalism, expressed as isomorphisms between inductive families into which data structure invariants are encoded and their simpler variants paired with predicates expressing those invariants. The interconnection has two directions: one analysing inductive families into simpler variants and predicates, and the other synthesising inductive families from simpler variants and specific predicates. They respectively give rise to two applications, one achieving a modular structure of internalist libraries, and the other bridging internalist programming with relational specifications and program derivation. The datatype-generic mechanisms supporting the applications are based on McBride’s ornaments. Theoretically, the key ornamental constructs — parallel composition of ornaments and relational algebraic ornamentation — are further characterised in terms of lightweight category theory. Most of the results are completely formalised in the Agda programming language.
2.3.3 with Standard Library version 0.6
my DPhil dissertation, which subsumes and streamlines the content of this paper, and includes more examples.
Dependently typed programming is hard, because ideally dependently typed programs should share structure with their correctness proofs, but there are very few guidelines on how one can arrive at such integrated programs. McBride’s algebraic ornamentation provides a methodological advancement, by which the programmer can derive a datatype from a specification involving a fold, such that a program that constructs elements of that datatype would be correct by construction. It is thus an effective method that leads the programmer from a specification to a dependently typed program. We enhance the applicability of this method by generalising algebraic ornamentation to a relational setting and bringing in relational algebraic methods, resulting in a hybrid approach that makes essential use of both dependently typed programming and relational program derivation. A dependently typed solution to the minimum coin change problem is presented as a demonstration of this hybrid approach. We also give a theoretically interesting ‘completeness theorem’ of relational algebraic ornaments, which sheds some light on the expressive power of ornaments and inductive families.
my DPhil dissertation, which subsumes and streamlines the content of this paper, and includes more examples.
Dependently typed programmers are encouraged to use inductive families to integrate constraints with data construction. Different constraints are used in different contexts, leading to different versions of datatypes for the same data structure. For example, sequences might be constrained by length or by an ordering on elements, giving rise to different datatypes ‘vectors’ and ‘sorted lists’ for the same underlying data structure of sequences. Modular implementation of common operations for these structurally similar datatypes has been a longstanding problem. We propose a datatype-generic solution, in which we axiomatise a family of isomorphisms between datatypes and their more refined versions as datatype refinements, and show that McBride’s ornaments can be translated into such refinements. With the ornament-induced refinements, relevant properties of the operations can be separately proven for each constraint, and after the programmer selects several constraints to impose on a basic datatype and synthesises a new datatype incorporating those constraints, the operations can be routinely upgraded to work with the synthesised datatype.
Same topic at the Workshop on Dependently Typed Programming (DTP) 2011, and ‘numerical representations à la ornamentation’ at Fun in the Afternoon 2012
This paper has been superseded by an extended version in Progress in Informatics.
Dependently typed programmers are encouraged to use inductive families to integrate constraints with data construction. Different constraints are used in different contexts, leading to different versions of datatypes for the same data structure. Modular implementation of common operations for these structurally similar datatypes has been a longstanding problem. We propose a datatype-generic solution based on McBride’s datatype ornaments, exploiting an isomorphism whose interpretation borrows ideas from realisability. Relevant properties of the operations are separately proven for each constraint, and after the programmer selects several constraints to impose on a basic datatype and synthesises an inductive family incorporating those constraints, the operations can be routinely upgraded to work with the synthesised inductive family.
Relational program derivation is the technique of stepwise refining a relational specification to a program by algebraic rules. The program thus obtained is correct by construction. Meanwhile, dependent type theory is rich enough to express various correctness properties to be verified by the type checker. We have developed a library, AoPA, to encode relational derivations in the dependently typed programming language Agda. A program is coupled with an algebraic derivation whose correctness is guaranteed by the type system. Two non-trivial examples are presented: an optimisation problem, and a derivation of quicksort where well-founded recursion is used to model terminating hylomorphisms in a language with inductive types.
This paper has been superseded by an extended version in the Journal of Functional Programming.
Dependent type theory is rich enough to express that a program satisfies an input/output relational specification, but it could be hard to construct the proof term. On the other hand, squiggolists know very well how to show that one relation is included in another by algebraic reasoning. We demonstrate how to encode functional and relational derivations in a dependently typed programming language. A program is coupled with an algebraic derivation from a specification, whose correctness is guaranteed by the type system.