-
Notifications
You must be signed in to change notification settings - Fork 5
Language Classification
We do not intend to involve any kinds of natural languages into our technology models as this would make properties more difficult to verify. Every language we want to talk about is used in the context of software engineering and we can show that there either exists some artifact that defines it or some technology that implements it.
In the past, there have been many approaches on how to classify programming languages. This topic is quite well understood. Now, we are interested in how to classify any machine readable language. As suggested in a paper by Shilov et al. in 2012 a multidimensional approach is necessary when trying to classify software languages. The figure above gives an overview of our classification dimensions that we discuss next:
- We will discuss the purpose-based classification more deeply in an extra subsection later.
- For programming languages, developers tend to classify by paradigms, such as Object-orientation etc. (see Concept for more). We say that a language facilitates the use of a paradigm through its provided language constructs. It is important to state that it only facilitates the use of a paradigm since you could write a Java program without actually following object-orientation.
- We already said that a language has some specific language constructs that are related to paradigms. Thus, we can also tell which constructs are provided by a language, which makes languages neatly comparable in this dimension.
- While speaking of constructs, there is one dimension that is actually essential. That is the kind of notation in which a language is encoded. Most languages are encoded in some textual syntax, but there are others that only have a visual syntax (e.g., UML) and need a specific exchange format (e.g., XMI) to enable data exchange. Further, other kinds of notation are markup or markdown or icon-based notations (see BPMN) in visual languages are important. These are rather assisting parts, since they only provide a better structure for textual or visual syntax (see RelaxNG's formats).
- Another typical classification is that of language families. A language family typically has an ascendant (e.g., C) and several descendants (e.g., C++ and Java).
- In analogy to technologies, languages may also belong to several *technology spaces and provide support in several programming domains. E.g., OCL belongsTo MDEWare and OCL supports QualityAssurance.
Shilov et al. suggest a classification by the purpose of a language. Every language has an original intended purpose. Thus, the classification relates to the kinds of artifacts that are created. This shall prevent us from stating that LaTeX is a programming language just because you can write small programs with it that assist in retrieving a nicely looking document. The original intention stays to write documents with it that classifies it as a DocumentLanguage. We propose the following list of purpose based classifiers.
- ProgrammingLanguage: A language intended to write programs with.
- AssemblyLanguage: A language intended to write assembly code with that is processed by an assembler.
- MachineLanguage: A language intended to write machine code with that can be interpreted by hardware.
- CommandLineLanguage: Such languages consist of statements that can only be posed in a command line interface. Many times, there may not be a grammar file that defines the syntax. A technology may directly implement it instead.
- BuildScriptLanguage: A language to write programs that manage the software build, e.g., Maven.
- DocumentLanguage: A language that shall be used to create documents with. Most textual document languages are structured through markup. Thus, we say that most languages that are classified mainly as markup languages should be classified as document languages.
- StylesheetLanguage: Languages can be specifically intended for writing stylesheets, e.g., CSS. Such stylesheets relate to a document and describe how it is presented.
- DataExchangeLanguage: Languages such as XML or JSON are used mainly for the purpose of data exchange between separate or even distributed systems.
- KnowledgeRepresentationLanguage: A language that is used to persist knowledge, typically using a controlled vocabulary, e.g., RDF or OWL.
- SchemaLanguage: It is a language to formulate a data model that structures the kind of data that can exist and how it can (has to) be interrelated.
- TransformationLanguage: Languages such as ATL, XSLT or DML are used to program rules of transformations.
- QueryLanguage: Languages can be used to write down queries as simple or complex read statements. Queries can be posed to databases or models, etc..
- GrammarLanguage: Such languages offer vocabulary for writting down the grammar of a language. Thus, they define what kind of textual or visual (see graph grammars) syntactic elements exist.
- TemplateLanguage: A language that can be used to write templates.
- SoftwareDesignLanguage: A language to desribe and/or prescribe the design of a software.
- ConfigurationLanguage: A language to encode some simple setting properties in a configuration file.
- LogLanguage: A text-based language in which logs are persisted.
- ReferenceLanguage: A text-based language to encode references, such as URI.
- MessageLanguage: A language used for communication between programs through messages that could typically be found in distributed systems, e.g. HTTP-requests or messages used by JMS.
- ValueLanguage: Such languages encode any kind of data at runtime that can only be seen through a program.
- ContainerLanguage: A fileformat to encode file containers that are typically compressed, such as .zip or .7z.
- DatabaseLanguage: A fileformat to encode a database that is typically in a binary format to persist large amounts of data.
- GraphicsLanguage: A fileformat to encode rendered graphics on a screen.
- subsetOf: In some cases, we only speak of specific subsets of a language. For example, ANTLR does not produce any kind of Java code, but only a specific subset. Having to name a defining artifact or implementing technology prevents us from talking about arbitrary language subsets.
- embeddedInto: When programming in a higher level language, one might want to still rely on language constructs from other languages. To still achieve this, one embeds language constructs from some DSL into the higher language.
- How to Megamodel
- Language Description
- Concrete Syntax
- Constraints
- Semantics
- Project How To's
- Install, Build, Test, Run
- Checker Tool
- Visualizer Tool