158x Filetype PDF File size 0.11 MB Source: web.stanford.edu
. John K. Ousterhout Sun Microsystems Laboratories Scripting: Higher- Cybersquare Level Programming for the 21st Century Increases in computer speed and changes in the application mix are making scripting languages more and more important for the applications of the future. Scripting languages differ from system programming languages in that they are designed for “gluing” applications together. They use typeless approaches to achieve a higher level of programming and more rapid application development than system programming languages. or the past 15 years, a fundamental change has been ated with system programming languages and glued occurring in the way people write computer programs. together with scripting languages. However, several FThe change is a transition from system programming recent trends, such as faster machines, better script- languages such as C or C++ to scripting languages such ing languages, the increasing importance of graphical as Perl or Tcl. Although many people are participat- user interfaces (GUIs) and component architectures, ing in the change, few realize that the change is occur- and the growth of the Internet, have greatly expanded ring and even fewer know why it is happening. This the applicability of scripting languages. These trends article explains why scripting languages will handle will continue over the next decade, with more and many of the programming tasks in the next century more new applications written entirely in scripting better than system programming languages. languages and system programming languages used Scripting languages are designed for different tasks primarily for creating components. than are system programming languages, and this leads to fundamental differences in the languages. SYSTEM PROGRAMMING LANGUAGES System programming languages were designed for To understand the differences between scripting lan- building data structures and algorithms from scratch, guages and system programming languages, it is starting from the most primitive computer elements important to understand how system programming such as words of memory. In contrast, scripting lan- languages evolved. System programming languages guages are designed for gluing: They assume the exis- were introduced as an alternative to assembly lan- tence of a set of powerful components and are guages. In assembly languages, virtually every aspect intended primarily for connecting components. System of the machine is reflected in the program. Each state- programming languages are strongly typed to help ment represents a single machine instruction and pro- manage complexity, while scripting languages are type- grammers must deal with low-level details such as less to simplify connections among components and register allocation and procedure-calling sequences. provide rapid application development. As a result, it is difficult to write and maintain large Scripting languages and system programming lan- programs in assembly languages. guages are complementary, and most major comput- By the late 1950s, higher-level languages such as ing platforms since the 1960s have included both kinds Lisp, Fortran, and Algol began to appear. In these lan- of languages. The languages are typically used together guages, statements no longer correspond exactly to in component frameworks, where components are cre- machine instructions; a compiler translates each state- 0018-9162/98/$10.00 © 1998 IEEE March 1998 23 . Scripting languages ment in the source program into a sequence of solely by the way it is used, not by any initial promises. assume that a binary instructions. Over time a series of sys- Modern computers are fundamentally typeless. Any collection of useful tem programming languages evolved from word in memory can hold any kind of value, such as Algol, including PL/1, Pascal, C, C++, and Java. an integer, a floating-point number, a pointer, or an components already System programming languages are less efficient instruction. The meaning of a value is determined by exist in other than assembly languages but they allow appli- how it is used. If the program counter points at a word languages. They are cations to be developed much more quickly. As of memory then it is treated as an instruction; if a word a result, system programming languages have is referenced by an integer add instruction, then it is intended not for almost completely replaced assembly languages treated as an integer; and so on. The same word can writing applications for the development of large applications. be used in different ways at different times. from scratch but In contrast, today’s system programming languages rather for combining Higher-level languages are strongly typed. For example: System programming languages differ from components. assembly languages in two ways: they are higher • Each variable in a system programming language level and they are strongly typed. The term must be declared with a particular type such as “higher level” means that many details are han- integer or pointer to string, and it must be used dled automatically, so programmers can write in ways that are appropriate for the type. less code to get the same job done. For example: • Data and code are segregated; it is difficult if not impossible to create new code on the fly. • Register allocation is handled by the compiler so • Variables can be collected into structures or that programmers need not write code to move objects with well-defined substructure and pro- information between registers and memory. cedures or methods to manipulate them. An • Procedure calling sequences are generated auto- object of one type cannot be used where an object matically; programmers need not worry about of a different type is expected. moving arguments to and from the call stack. • Programmers can use simple keywords such as Typing has several advantages. First, it makes large whileand if for control structures; the com- programs more manageable by clarifying how things piler generates all the detailed instructions to are used and differentiating among things that must be implement the control structures. treated differently. Second, compilers use type infor- mation to detect certain kinds of errors, such as an On average, each line of code in a system pro- attempt to use a floating-point value as a pointer. gramming language translates to about five machine Third, typing improves performance by allowing com- instructions, compared with one instruction per line in pilers to generate specialized code. For example, if a an assembly program. (In an informal analysis of eight compiler knows that a variable always holds an inte- C files written by five different people, I found that the ger value, then it can generate integer instructions to 1 manipulate the variable; if the compiler does not know ratio ranged from three to seven instructions per line; in a study of numerous languages, Capers Jones found the type of a variable, then it must generate additional that, for a given task, assembly languages require three instructions to check the variable’s type at runtime. to six times as many lines of code as system program- Figure 1 compares a variety of languages on the 2 basis of their level of programming and strength of ming languages. ) Programmers can write roughly the same number of lines of code per year regardless of typing. 3 language, so system programming languages allow applications to be written much more quickly than SCRIPTING LANGUAGES 4 5 6 assembly languages. Scripting languages such as Perl, Python, Rexx, Tcl,7Visual Basic, and the Unix shells represent a very Typing different style of programming than do system pro- The second difference between assembly languages gramming languages. Scripting languages assume that and system programming languages is typing. I use a collection of useful components already exist in other the term typing to refer to the degree to which the languages. They are intended not for writing applica- meaning of information is specified in advance of its tions from scratch but rather for combining compo- use. In a strongly typed language, the programmer nents. For example, Tcl and Visual Basic can be used declares how each piece of information will be used, to arrange collections of user interface controls on the and the language prevents the information from being screen, and Unix shell scripts are used to assemble fil- used in any other way. In a weakly typed language, ter programs into pipelines. Scripting languages are there are no a priori restrictions on how information often used to extend the features of components; how- can be used; the meaning of information is determined ever, they are rarely used for complex algorithms and 24 Computer . data structures, which are usually provided by the com- ponents. Scripting languages are sometimes referred to 1,000 Scripting as glue languages or system integration languages. Scripting languages are generally typeless Visual Basic To simplify the task of connecting components, 100 scripting languages tend to be typeless. All things look and behave the same so that they are interchangeable. Tcl/Perl For example, in Tcl or Visual Basic a variable can hold Java a string one moment and an integer the next. Code C++ and data are often interchangeable, so that a program 10 C can write another program and then execute it on the fly. Scripting languages are often string-oriented, as Instruction/statement this provides a uniform representation for many dif- Assembly ferent things. System programming A typeless language makes it much easier to hook 1 together components. There are no a priori restrictions on how things can be used, and all components and None Strong values are represented in a uniform fashion. Thus any Degree of typing component or value can be used in any situation; com- ponents designed for one purpose can be used for This command creates a new button control that dis- Figure 1. A compari- totally different purposes never foreseen by the plays a text string in a 16-point Times font and prints son of various designer. For example, in the Unix shells all filter pro- a short message when the user clicks on the control. programming grams read a stream of bytes from an input and write The command mixes six different types of things in a languages based on a stream of bytes to an output. Any two programs can single statement: a command name (button), a but- their level (higher- be connected by attaching the output of one program ton control (.b), property names (-text, -font, and level languages exe- to the input of the other. The following shell command -command), simple strings (Hello! and hello), a cute more machine stacks three filters together to count the number of lines font name (Times 16) that includes a typeface name instructions for each in the selection that contain the word “scripting”: (Times) and a size in points (16), and a Tcl script language statement) (puts hello). Tcl represents all of these things uni- and their degree of select | grep scripting | wc formly with strings. In this example, the properties typing. System pro- can be specified in any order and unspecified proper- gramming languages The selectprogram reads the text that is currently ties are given default values; more than 20 properties such as C tend to be selected on the display and prints the text on its out- were left unspecified in the example. strongly typed and put; the grep program reads its input and prints on The same example requires seven lines of code in medium level (five to its output the lines containing “scripting”; the wc pro- two methods when implemented in Java. With C++ 10 instructions per gram counts the number of lines on its input. Each of and Microsoft Foundation Classes (MFC), it requires statement). Scripting these programs can be used in numerous other situa- about 25 lines of code in three procedures.1 Just set- languages such as Tcl tions to perform different tasks. ting the font requires several lines of code in MFC: tend to be weakly The strongly typed nature of system programming typed and very high languages discourages reuse. It encourages program- CFont *fontPtr = new CFont(); level (100 to1,000 mers to create a variety of incompatible interfaces, each fontPtr->CreateFont(16, 0, 0, 0, 700, instructions per state- of which requires objects of specific types. The com- 0, 0, 0, ANSI_CHARSET, ment). piler prevents any other types of objects from being OUT_DEFAULT_PRECIS, used with the interface, even if that would be useful. So CLIP_DEFAULT_PRECIS, to use a new object with an existing interface, the pro- DEFAULT_QUALITY, grammer must write conversion code to translate DEFAULT_PITCH|FF_DONTCARE, between the type of the object and the type expected by “Times New Roman”); the interface. This in turn requires recompiling part or buttonPtr->SetFont(fontPtr); all of the application; many applications today are dis- tributed in binary form so this is not possible. Much of this code is a consequence of the strong typ- To appreciate the advantages of a typeless language, ing. To set the font of a button, its SetFont method consider the following Tcl command: must be invoked, but this method must be passed a pointer to a CFontobject. This in turn requires a new button .b -text Hello! -font {Times object to be declared and initialized. To initialize the 16} -command {puts hello} CFont object its CreateFont method must be March 1998 25 . It might seem that invoked, but CreateFonthas a rigid interface length strings in situations where a system programming the typeless nature of that requires 14 different arguments to be spec- language would use a binary value that fits in a single ified. In Tcl, the essential characteristics of the machine word, and they use hash tables where system scripting languages font (typeface Times, size 16 points) can be used programming languages use indexed arrays. could allow errors to immediately with no declarations or conversions. Fortunately, the performance of a scripting language go undetected, but in Furthermore, Tcl allows the button’s behavior is not usually a major issue. Applications for scripting to be included directly in the command that cre- languages are generally smaller than applications for practice scripting ates the button, while C++ and Java require it to system programming languages, and the performance languages are just as be placed in a separately declared method. of a scripting application tends to be dominated by the safe as system (In practice, a trivial example like this would performance of the components, which are typically programming probably be handled with a graphical develop- implemented in a system programming language. ment environment that hides the complexity of Scripting languages are higher level than system pro- languages. the underlying language. The user enters prop- gramming languages in the sense that a single state- erty values in a form and the development envi- ment does more work on average. A typical statement ronment outputs the code. However, in more in a scripting language executes hundreds or thou- complex situations, such as conditional assign- sands of machine instructions, whereas a typical state- ment of property values or interfaces generated pro- ment in a system programming language executes grammatically, the developer must write code in the about five machine instructions (as Figure 1 illus- underlying language.) trates). Part of this difference is because scripting lan- It might seem that the typeless nature of scripting lan- guages use interpreters, but much of the difference is guages could allow errors to go undetected, but in prac- because the primitive operations in scripting languages tice scripting languages are just as safe as system have greater functionality. For example, in Perl it is programming languages. For example, an error will about as easy to invoke a regular expression substitu- occur if the font size specified for the button example tion as it is to invoke an integer addition. In Tcl, a above is a noninteger string such as xyz. Scripting lan- variable can have traces associated with it so that set- guages do their error checking at the last possible ting the variable causes side effects; for example, a moment, when a value is used. Strong typing allows trace might be used to keep the variable’s value errors to be detected at compile time, so the cost of run- updated continuously on the screen. time checks is avoided. However, the price to be paid Scripting languages allow rapid development of glu- for efficiency is restrictions on how information can be ing-oriented applications. Table 1 provides anecdotal used; this results in more code and less flexible programs. support for this claim. It describes several applications that were implemented in a system programming lan- Scripting languages are interpreted guage and then reimplemented in a scripting language Another key difference between scripting languages or vice versa. In every case, the scripting version and system programming languages is that scripting lan- required less code and development time than the sys- guages are usually interpreted, whereas system pro- tem programming version; the difference varied from gramming languages are usually compiled. Interpreted a factor of two to a factor of 60. Scripting languages languages provide rapid turnaround during development provided less benefit when they were used for the first by eliminating compile times. Interpreters also make implementation; this suggests that any reimplementa- applications more flexible by allowing users to program tion benefits substantially from the experiences of the the applications at runtime. For example, many synthe- first implementation and that the true difference sis and analysis tools for integrated circuits include a Tcl between scripting and system programming is more interpreter; users of the programs write Tcl scripts to like a factor of five to 10, rather than the extreme specify their designs and control the operation of the points of the table. The benefits of scripting also tools. Interpreters also allow powerful effects to be depend on the application. In the last example in Table achieved by generating code on the fly. For example, a 1, the GUI part of the application is gluing-oriented Tcl-based Web browser can parse a Web page by trans- but the simulator part is not; this might explain why lating the HTML for the page into a Tcl script using a the application benefited less from scripting than other few regular expression substitutions. It then executes the applications. Tcl script to render the page on the screen. The information in the table was provided by vari- Scripting languages are less efficient than system pro- ous Tcl developers in response to an article posted on gramming languages, in part because they use inter- the comp.lang.tcl newsgroup.1 preters instead of compilers but also because their basic components are chosen for power and ease of use rather DIFFERENT TOOLS FOR DIFFERENT TASKS than an efficient mapping onto the underlying hardware. A scripting language is not a replacement for a sys- For example, scripting languages often use variable- tem programming language or vice versa. Each is 26 Computer
no reviews yet
Please Login to review.