98x Filetype PDF File size 1.11 MB Source: www.cs.yale.edu
Distill: Domain-Specific Compilation for Cognitive Models ´ ´∗§ ∗§ ∗ †‡ Jan Vesely Raghavendra Pradyumna Pothukuchi Ketaki Joshi Samyak Gupta † ∗ Jonathan D. Cohen Abhishek Bhattacharjee ∗Yale University, USA †Princeton University, USA Abstract—Computational models of cognition enable a better [5]–[9]. Unfortunately, models developed in Python run slowly. understanding of the human brain and behavior, psychiatric and As more sophisticated cognitive models are built to capture neurological illnesses, clinical interventions to treat illnesses, and advanced brain processes, Python’s inefficiency worsens to the also offer a path towards human-like artificial intelligence. Cog- extent that some cognitive models can take several days or nitive models are also, however, laborious to develop, requiring composition of many types of computational tasks, and suffer weeks to run, hindering scientific progress. from poor performance as they are generally designed using Dynamic compilation tools like PyPy [10] and Pyston [11] high-level languages like Python. In this work, we present Distill, can accelerate cognitive models, but only partially. PyPy and a domain-specific compilation tool to accelerate cognitive models Pyston cannot optimize complex dependencies in cognitive while continuing to offer cognitive scientists the ability to develop models because of the runtime checks needed for Python’s their models in flexible high-level languages. Distill uses domain- specific knowledge to compile Python-based cognitive models into dynamic data structures and dynamic typing. Large-scale LLVMIR,carefully stripping away features like dynamic typing cognitive models also require integration of sub-models devel- and memory management that add performance overheads with- oped across environments (e.g., PyTorch [8], Emergent [12], out being necessary for the underlying computation of the models. NEURON [9] or PsyNeuLink [7]); it is difficult to design The net effect is an average of 27× performance improvement compilers that optimize across computations expressed in in model execution over state-of-the-art techniques using Pyston and PyPy. Distill also repurposes classical compiler data flow several environments. All these aspects of cognitive models also analyses to reveal properties about data flow in cognitive models obscure the natural parallelism available in cognitive models, that are useful to cognitive scientists. Distill is publicly available, and impede the ability to offload portions of the models onto integrated in the PsyNeuLink cognitive modeling environment, hardware accelerators for which they are otherwise suitable. and is already being used by researchers in the brain sciences. New domain-specific languages for cognitive modeling Index Terms—Domain-specific compilation, cognitive models, human brain, JIT compilers, Python. would likely maximize performance, but require large-scale community buy-in and porting of many models already built I. INTRODUCTION across many research institutions using Python. Additionally, Computational models that simulate the processes underlying cognitive models are heterogeneous, integrating components human cognition advance our understanding of the human with varying levels of biological fidelity, developed in different brain and mind. They describe how stimuli are acted upon frameworks and research groups; e.g., a single model can by various neural or mental mechanisms to produce cognitive include neurally accurate descriptions of some brain structures, function. Insights from cognitive modeling have influenced an artificial neural network from machine learning to determine not only the brain, psychological, and cognitive sciences, but the attention allocated to inputs, and a behavioral model also the field of artificial intelligence (AI), from the onset of of control to modulate the pathways. Extreme heterogeneity artificial neural networks to recent advances in deep learning impedes the design of a canonical set of language constructs for gameplay and scientific computing [1], [2]. Cognitive and software tools needed for a domain-specific language. models are expected to augment AI by offering brain-like In response, we build Distill, a dynamic compilation tool intelligence not currently captured by deep learning (e.g., that exploits the domain knowledge of cognitive modeling relational reasoning [3], planning [4], and more). to generate efficient code for the models. Distill aggressively Cognitive models are computationally demanding. They are eliminates Python’s dynamic code, and generates LLVM IR typically run hundreds of thousands of times to estimate the best for all the components in a model, including those developed model parameters to explain experimental data (e.g., human in ancillary environments (e.g., Pytorch) and the frameworks responses to psychological tasks), to assess the dynamics of used to run the models (e.g., PsyNeuLink). Distill is inspired cognitive processes over time steps, or to collect distributions by the observation that cognitive scientists, like scientists in of outcomes when the models include stochastic elements. other communities [13], [14], use Python because it is flexible, Cognitive scientists typically use Python to rapidly prototype easy and provides access to optimized scientific computing cognitive models using optimized scientific computing libraries libraries [5], [6], but do not need many of the dynamic language § features that degrade performance. Eliding these dynamic Joint first authors of the research highlighted in this paper. features improves performance in itself, and also enables the use ‡Samyak Gupta began contributing to this work as an undergraduate student at Rutgers University via a summer research project in 2019 at Yale University. of existing optimizations in the LLVM framework to further 978-1-6654-0584-3/22 © 2022 IEEE 301 Accepted for publication by IEEE. © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. boost performance. The choice of LLVM IR also permits various modeling environments. Distill is being used in several leveraging LLVM’s existing code generation backends for CPU leading cognitive science research labs and in the classroom architectures and accelerators, with no change. internationally. Overall, Distill enables the design of larger and Distill accelerates cognitive model execution by an average more complex cognitive models than previously possible. This of 27× and maximum of 923× compared to Pyston and PyPy is an important and necessary step towards the longstanding for a suite of well-known cognitive models. Distill enables research goal of understanding and replicating human cognition. one widely-studied cognitive model to execute in under five seconds, even though it originally failed to complete within II. BACKGROUND AND MOTIVATION twenty-four hours. Distill also extracts parallelism from the A. Cognitive Models: Structure and Computation models and targets multi-core CPUs and GPUs, resulting in Cognitive models are used to fit experimental data collected additional 4.9× and 6.4× speedups, respectively. from humans performing psychological tasks, simulating Lowering entire cognitive models, including sub-models cognitive processes, producing idealized outcomes, or for what- from other environments and the execution framework, into if analysis to understand the impact of tunable structures and LLVM IR offers an additional benefit—the ability to use parameters. Cognitive models are represented as graphs, where compiler analysis to infer semantic properties about the model. nodes are sub-processes or computational functions. Edges Distill automates several types of model-level analysis that have represent projections of signals between nodes. Nodes perform traditionally been manually undertaken by scientists in labor- their computation when activation conditions are met (e.g., the intensive and tedious ways. These analyses also permit Distill to arrival of an input, or, the passing of a specified time period). discover user-guided optimizations specific to cognitive models. Figure 1 illustrates the predator-prey task [16] that is used Wedemonstrate two examples of Distill’s analyses and user- to study the role of cognitive control in allocating attention. guided optimizations. First, Distill identifies cases where entire An intelligent agent, either a human or non-human primate, is models can be verified to be equivalent, and for some models, given a controller and shown a screen with three entities—a recognizes when certain complex nodes are equivalent with— player that is positioned with the controller, a prey that the and hence, can be replaced by—simpler modules that have player must capture, and a predator that the player must avoid. an analytical solution. Second, Distill calculates the impact of The agent’s attention is limited, and there is a cost for paying a cognitive model’s parameters on its outputs and finds their attention to an entity. Attention determines the accuracy with optimal values entirely with compiler analysis built on LLVM’s which the agent can perceive that entity’s location. The agent value range propagation and scalar evolution passes. Ordinarily, does not have to distribute its attention fully. parameter estimation requires over hundreds to thousands of model runs, but Distill automates this step, saving days to Obs weeks of modeling effort. Moreover, because of the general Player Loc utility of our enhancements to LLVM’s passes (i.e., extending Player LocPredator Control Obs Action support for integers to floating point), we have submitted a Predator patch to the LLVM community for mainline integration. LocPrey Obs Distill’s design is guided by three main principles. First, we Prey wish to avoid requiring cognitive scientists to change the source- Objective code of their models or frameworks. Second, we delegate Fig. 1: A computational cognitive model of an intelligent agent performing performance extraction to the compiler, allowing scientists to the predator-prey task. focus on creating models in the manner most intuitive to them. The role of attention in modulating perceived locations is Third, we minimize software engineering effort, reusing LLVM modeled through the Control, Objective and Obs nodes. Control IR and its associated infrastructure. takes the exact 2-dimensional (2D) coordinates of all the entities To summarize, our research contributions are: 1) Distill, a compilation tool that exploits domain-specific (Loc values) per time step. The exact locations are obtained knowledge to provide near-native execution speeds for from an external environment like the gameplay interface that cognitive models, along with support to offload computa- is used by the agent. The Control node allocates attention to tions on accelerators. Distill does not require changes to each of the entities, determining the variances of three 2-D source code and reuses existing LLVM infrastructure. Gaussian distributions whose means are the actual locations of 2) Discovery that user-guided analyses and optimization can the entities. These distributions are sent to the Obs nodes that be performed by compiler analysis, and incorporation of sample from them to generate the observed locations. Finally, this idea into Distill. Action calculates the player’s movement for the time-step based 3) Evaluation of Distill-accelerated models on single and on the observed positions of the three on-screen entities. multicore CPUs and GPU. To identify the best allocation of attention for each entity, Distill is integrated with PsyNeuLink [7], a state-of-the- Control searches over all possible attention allocations, evaluat- 1 ing the cost of each allocation and the quality of the associated art cognitive modeling framework . PsyNeuLink, along with the emerging Model Description Format (MDF) [15], enables 1PsyNeuLink, integrated with Distill, is publicly available at https://github. the import and execution of sub-models developed across com/PrincetonUniversity/PsyNeuLink/tree/master. 302 move. The cost of each allocation is calculated by Control, and #Import psyneulink and numpy libraries#Import psyneulink and numpy libraries the quality of the move is computed by Objective. Objective import import psyneulink psyneulink as as pnlpnl uses the direction given by Action and the true location of the import import numpy numpy as as npnp prey to compute the goodness of the move. Control then selects from from psyneulink.core.components.functionspsyneulink.core.components.functions…… the parameters that have the lowest cost. The entire process, from from psyneulink.core.components.mechanismspsyneulink.core.components.mechanisms…… from reading the input locations to searching over allocations, … … is repeated per time-step until the prey or the player is captured. #Loc nodes#Loc nodes Scientists are interested in both the final outcome of the task, prey_loc = prey_loc = ProcessingMechanismProcessingMechanism(size=2,name=(size=2,name=……)) and the temporal dynamics of decision-making. … … Figure 1 shows the basic predator-prey model, but advanced #Obs nodes with a PsyNeuLink library function#Obs nodes with a PsyNeuLink library function variants can include the use of memory to recall previous prey_obs = prey_obs = locations, neural networks trained on experimental data to ProcessingMechanismProcessingMechanism(size=2,function=(size=2,function=GaussianDistortGaussianDistort generate moves, or visual processors that extract locations from ,name=,name=……)) ……)) … … …… screen frames. These added components may be developed in #User defined action function#User defined action function PyTorch, Emergent, NEURON, and others. def def action_fnaction_fn(positions):(positions): B. The PsyNeuLink Framework predator_pos = positions[0]predator_pos = positions[0] …… While cognitive models have historically been developed in #floating point computations to obtain move#floating point computations to obtain move many environments, recent efforts focus on a single “lingua #e.g., numpy.sqrt, numpy.exp,#e.g., numpy.sqrt, numpy.exp,… … ……):): … … franca” environment that can exchange models built across return return movemove …… environments [15]. PsyNeuLink, the environment in which we #Action node with the user defined function#Action node with the user defined function prototype Distill, is prominent among emerging standardized action_mech = action_mech = environments that accept models specified in MDF, a common pnl.pnl.ProcessingMechanismProcessingMechanism(function=(function=action_fnaction_fn,, ……)) format to represent models from several environments including input_ports=["obs_predator",input_ports=["obs_predator",……],],name="Action"name="Action"……)) ……)) PyTorch and ONNX [17]. Despite its nascence, PsyNeuLink is #Compose nodes into a model#Compose nodes into a model …… already used in leading cognitive science research laboratories agent_comp = agent_comp = CompositionComposition(name=(name=””The ModelThe Model”)”) and classrooms worldwide (e.g., at Princeton University, agent_comp.agent_comp.add_nodeadd_node(action_mech)(action_mech) Arizona State University, University of Leiden). Furthermore, … … the MDF project, which is contributed by an even larger body #Add controller that uses the GridSearch function#Add controller that uses the GridSearch function of researchers [15], greatly expands the reach of PsyNeuLink. control = control = Figure 2 shows the predator-prey task modeled OptimizationControlMechanismOptimizationControlMechanism(agent_rep=agent_comp,(agent_rep=agent_comp, in PsyNeuLink. Model nodes are referred to function=function=GridSearchGridSearch((……),),……)) as Mechanisms (ProcessingMechanism or agent_comp.agent_comp.add_controlleradd_controller(control)(control) …… OptimizationControlMechanism), and the model #Specify inputs#Specify inputs is a Composition. Each node contains a function input_dict = input_dict = which could be a PsyNeuLink-defined function (e.g., the {player_pos:[[XX,XX]],predator_pos:[[YY,YY]],prey_p{player_pos:[[XX,XX]],predator_pos:[[YY,YY]],prey_p GaussianDistort and GridSearch functions used in os:[[ZZ,ZZ]]}os:[[ZZ,ZZ]]} the prey_obs and control nodes, respectively) or could #Run#Run be user-defined (e.g., the action_fn in the action_mech run_results = agent_comp.run_results = agent_comp.runrun(inputs=input_dict, (inputs=input_dict, node). Inputs are defined as a dictionary and the composition num_trials=1000,mode=num_trials=1000,mode=……)) is run as many times as specified by num_trials. The PsyNeuLink library contains functions common in the Fig. 2: Specifying the predator-prey model in PsyNeuLink. cognitive sciences. Users can also define their own functions. These functions perform numerical computations to model compilation frameworks like Numba [21] use a similar subset neural or mental processing, and contain a subset of Python. of Python. This knowledge is not (but should be) used to The current MDF specification [18], to which PsyNeuLink con- aggressively optimize cognitive models. forms, limits these functions to arithmetic, boolean, relational, and conditional operators, as well as lists of homogeneous C. Cognitive Model Execution types, tuples, arrays, Python built-in functions (e.g., sum, len, Figure 3 shows the high-level steps behind running a model max, int and float conversion), numpy functions (e.g., tanh, exp, in PsyNeuLink. When the model is run (in the format shown sqrt) and attributes (e.g., shape, flatten). MDF does not currently in the last line of Figure 2), PsyNeuLink first performs allow generators, list comprehensions, I/O operations, as well a sanitization check to ensure that the nodes are properly as try and except constructs in these functions. This is a connected. It runs through all nodes, initializing all parameters commonstandard in this domain—e.g., other environments like and inputs with default values and propagating inter-node NeuroML [19] for neuronal modeling, TorchScript for PyTorch signals. The shapes of each node’s inputs and outputs in the model creation [20], or even generic high-performance dynamic sanitization run must match those used in the actual run. 303 #Sanitize the model#Sanitize the model signals have a fixed type, and also a fixed shape across runs. sanitizesanitize()() Thus, dynamic Python structures such as lists and dictionaries #Parse inputs#Parse inputs that are used to hold these values can be safely compiled …… parsed_inputs = parsed_inputs = parse_run_inputsparse_run_inputs(inputs)(inputs) to static data structures. However, changing a data structure …… … … requires updating all the accesses to that structure in the entire #Run each trial#Run each trial program, but existing tools usually focus only on individual results = []results = [] functions and cannot undertake such aggressive optimizations. ……)) for for trial_num trial_num in in rangerange(num_trials):(num_trials): … … #Get the input for the trial#Get the input for the trial Third, Pyston and PyPy cannot optimize across computations input = parsed_inputs[trial_num % input = parsed_inputs[trial_num % lenlen(parsed_inputs)](parsed_inputs)] from different frameworks, and across scheduling invocations #Execute#Execute between executions of the model nodes. When a model uses trial_output = trial_output = execute_trialexecute_trial(inputs=input, (inputs=input, computations from multiple environments like PyTorch and ……)) scheduler=scheduler=……)) … … …… PsyNeuLink, even if the separate components are compiled, #Handle outputs#Handle outputs optimization does not cross these frameworks. Additionally, ex- results.results.appendappend(trial_output)(trial_output) ecution frequently switches between nodes and the scheduling return return resultsresults logic that identifies which nodes are ready to run. Transitions …… back and forth between the model nodes and scheduler logic #Execute one trial#Execute one trial … … def def execute_trialexecute_trial(inputs=(inputs=……):): limit the scope of compiler optimizations, switching execution … … …… between compiled and interpreted modes. while notwhile not termination_cond:termination_cond: Finally, available dynamic compilation tools cannot automat- #Find all nodes ready to run in this iteration #Find all nodes ready to run in this iteration #and run them#and run them ically extract parallelism from the models or offload computa- ready_nodes = scheduler.ready_nodes = scheduler.runrun((……)) tions to accelerators like GPUs. This is a wasted opportunity forfor node node inin ready_nodes:ready_nodes: ……],],……)) node.node.executeexecute(inputs=(inputs=……)) as there are several dimensions along which computations …… in cognitive models can be parallelized. For example, in the #Collect results from the output ports of nodes #Collect results from the output ports of nodes ”””)”) #marked as outputs#marked as outputs predator-prey model, the evaluations for each combination of result = []result = [] attention allocations could have been run in parallel. When … … for for node node in in output_nodes:output_nodes: multiple samples are drawn from the distributions of observed result.result.appendappend(node.output_port.value)(node.output_port.value) locations, each sample and subsequent action could also be return return resultresult computed in parallel. While one might consider leveraging ……),),……)) existing multithreading and GPU programming libraries for …… Fig. 3: Running a model in PsyNeuLink. Python, they all require scientists to explicitly identify such parallel computations and mark functions to be offloaded to Next, PsyNeuLink prepares a list of inputs from the input a GPU. A more desirable solution is to automate these steps dictionary, such that each element of the list is an input to so that cognitive scientists can solely focus on their designs the model for one trial. The model is run until the completion rather than grapple with parallel programming constructs. of the trial (function execute_trial in Figure 3). A trial These shortcomings lead to cascading slowdowns. For terminates when certain conditions are met; e.g., after a move example, unoptimized data structures not only have longer ……)) for the player has been selected. During the trial, the scheduler access times, but also prevent subsequent optimization passes identifies the nodes that are ready to run in each iteration based by hindering the propagation of values and references. Mul- on the activation conditions that are explicitly specified per tithreading with Python does not result in parallel execution node. Examples of such conditions include waiting until other because the threads are serialized by the Global Interpreter nodes are run a certain number of times, until the outputs Lock [22], unless the threads run compiled code, in which case of particular nodes stabilize, or after an amount of time has they do not have to take the lock. To maximally parallelize, it elapsed. Finally, the trial outputs are collected and returned. is important to compile the Python threads. D. Shortcomings of Dynamic Compilation Tools III. DISTILL: DOMAIN-SPECIFIC COMPILATION FOR Existing dynamic compilation tools like Pyston [11] and COGNITIVE MODELS PyPy [10] miss many optimization opportunities for cognitive Cognitive models are computational graphs with complex models because of several reasons. First, they cannot easily scheduling rules. They are constructed in Python, and can fuse identify opportunities to reduce the runtime overheads required heterogeneous sub-models developed in multiple frameworks. for tracking control flow. For example, the predator-prey model Dynamic compilation with existing tools is not effective. Distill in Section II-A is run many times for a single input, but the has domain-specific knowledge about the structure of model path of execution is the same for all these runs. This is typical execution, and the expressions and data structures used in the of cognitive models, but takes significant resources for PyPy models (Sections II-B and II-C). It uses this knowledge to and Pyston to track. optimize the models to near-native execution speeds. Second, existing dynamic compilation tools do not fully While Python’s dynamic features ease model construction, eliminate unnecessary dynamic Python features; e.g., inter-node they are not actually necessary for model execution. Distill 304
no reviews yet
Please Login to review.