As far as I know, there haven’t been any rigorous studies done on this topic.
While that is unfortunate, I completely understand why it is the case.
Large-scale educational experiments are challenging to do well and are influenced by a lot of different variables.
Creating a study where language feature influence was larger than the noise would be very challenging.
As such, I’m just going to throw out some ideas.
I feel that a language that leads well into other languages is one that isn’t too odd in either syntax or semantics so that students can move to a variety of other languages and paradigms without being surprised by the new language.
Let’s start with the easy part of this, use a language that isn’t too syntactically different from the ones they will learn next.
Most people would likely agree that you shouldn’t start in APL or J.
I see this as an argument against Scheme/Racket as well.
(I have to point out that every language will have strikes against it, and what matters is the total weight of pros and cons.
For me, the dynamic typing of Scheme/Racket is a bigger killer than the syntax for the reasons listed above, but everyone places different weights on these things and there are certainly some big positives for Scheme/Racket.
)I’d even go further and argue that if a language has too many control structures that aren’t seen in other places, that could be a problem as well.
For example, Python supports an else clause on loops that you won’t see in other languages.
The use of Boolean operators as shortcuts that I mentioned above could be seen as a fail here as well.
On the semantic side, I think that scope is again significant.
I have been told by a TA from Rice University that when students get to Java in their third semester, after having two semesters of Python, that many find block scope to be confusing.
Outside of Python, PHP, and Ruby, block scope is the standard rule, and given that scope is something beginning students struggle with, having a first language that handles it differently than most others seems problematic to me for future growth.
Python also has very unusual handling of default parameters to functions.
Consider the following code.
>>> def foo(a, b=):.
>>> foo(3)>>> foo(4)[3, 4]While the scope of b is only inside of foo, its memory is allocated at the same level as the declaration of foo and it is remembered from one invocation to the next.
I have yet to find another language that has this semantics for default values to functions.
Again on the semantic side, languages that have very different models of OO from the “norm” is also a red flag for me.
Python also has odd elements to its OO model, like the need to list self as the first argument for methods.
I can see how some might see this as a nice advantage for teaching showing how the object is available, but when the student moves to another language, it is yet one more thing to handle differently.
Lastly, I feel very strongly that before graduating, students should be exposed to multiple paradigms of programming.
If a major never covers anything other than the OO-Imperative style of Java and C++, they will find it much harder to pick up other languages later that either aren’t object-oriented or that focus on functional instead of imperative.
For this reason, I actually like the idea of multi-paradigm languages for CS1 and CS2.
Of course, one has to keep in mind that multi-paradigm can break a lot of the other things we are looking for if it makes things more complex.
Interesting First Semester AssignmentsOne of the reasons that Java took off in the education space was the relative ease with which students would write graphical code, especially using Applets.
The Applets have long since died, but the ability to put interesting assignments into CS1 and CS2 is still very strong.
Interesting here can mean many things and goes well beyond graphics today to include things like playing with data, robotics, or socially relevant problems.
Regardless of the details, a general requirement is that you have strong library support, and hopefully ways of bringing in those libraries that doesn’t overly complicate things.
This is an area where JVM languages and Python excel.
Really low-level languages, like C, tend to fall down the most here because the amount of code that is required to do anything interesting is often quite large.
Other FactorsThere are a number of other factors that impact my thoughts on picking a language for introductory programming.
I personally also appreciate languages that run on multiple platforms and are hopefully open source.
Those languages generally are better for allowing students to work on their own machines, regardless of their OS, and have tooling that is free of cost.
I also really like a language that works well for both CS1 and CS2.
I discuss that in more depth in an earlier post.
I also like to use a language and accompanying tools that are used professionally.
It doesn’t have to be a top 5 language.
I’m fine with top 20 because the reality is that no single language dominates today, so whatever language you pick, the ability to learn other languages is more significant than knowing a specific language.
The thing is, I’m not all that big on teaching languages and teaching environments in courses for CS majors.
The reason is pretty simple, I only have students for 4 years and a limited number of hours and since I want to use the same language for CS1 and CS2, I don’t feel comfortable spending that much time teaching something that I know isn’t used anywhere outside of the classroom.
We also have to realize that the languages used in industy aren’t fixed and what we choose to teach in colleges has an impact on them.
One of the things people consider when they pick the language for a new project is whether they can hire enough developers of sufficient talent who know it.
The more colleges that teach a language, the more likely it is to be adopted in industry.
I’m quite certain that the broad adoption of Java in colleges played a major factor in its dominance in industry.
Similarly, the current growth in Python is inevitably fuelling its professional adoption.
So when you pick a language to teach, you might ask yourself, “Is this the language I want the software that runs my life to be written in?” If you don’t think you want your bank or the elevator you ride in using that language, perhaps you should pick a different one.
Why People Choose PythonThe language that seems to be taking over early CS education right now is Python.
That is actually what prompted me to write this post because I worry about this particular choice for introductory language.
There is no doubt that Python has some real advantages in the fact that it has low overhead in CS1 and that there is broad library support to enable students to do interesting things.
Unfortunately, it falls down in other areas.
The ones I worry most about are an inability to cover certain topics that I consider significant and the possibility for students to pick up bad habits that the language doesn’t prevent.
Perhaps the most standard argument that I hear for Python is that it is “simple” and easy for students to learn.
While I completely agree that Python is an easy language for experienced programmers to pick up, it isn’t at all clear that being easy for experienced programmers to learn is the same as being easy for beginning developers to learn.
Learning to program isn’t just about syntax.
It is about learning a new way of thinking, how to structure logic, and the general semantics of that syntax.
Having a language that enforces more structure and which gives early feedback when rules are broken could actually be very useful for novices.
Some evidence for this was provided by Alzhrhani et al.
(2018) who found that students in a large data sample struggle more with Python than they do with C++.
Indeed, I would argue that moving from a language that provides more error checking to one that provides less error checking is generally easier than going the other way, and that more assistance from the language to write good code is especially beneficial for the novice.
Having a language that helps the student to build their mental model of the semantics of programming could actually be more significant than having a simple syntax.
I feel compelled to mention that another common argument for Python is that it teaches indentation.
There is definitely truth to this, but I can’t say I find this very motivating, especially since Python lacks the block scope that the indented blocks indicate.
The reality is that auto-formatting tools for languages without significant whitespace are well developed and companies with strict style guides can easily have them enforced by software.
In contrast to this, automatically cleaning up poor type usage in programs written for dynamically typed languages is a far more complex problem.
In some ways, the true challenge of teaching introductory programming using Python is probably summed up well by terms frequently seen in discussions of Python programs.
It doesn’t take much reading on Python to come across the term “Pythonic”.
When I asked a Python programmer about functions that return different types based on the argument values, like xs on Pandas’ Datasets, I was told that was un-Pythonic.
The problem is that Pythonic and un-Pythonic are advanced concepts.
When you are dealing with students who are still trying to understand the basics of conditionals, iterations, and functions, they simply aren’t prepared to comprehend “Pythonic”.
Instead of having good coding done by convention, those students need a language that does more to enforce good style.
My Choice for CS1 and CS2If you’ve gotten this far, you have already shown great perseverance in our current age of short attention spans.
You might be wondering what language I would pick for teaching CS1 and CS2.
I actually like Scala as the language for these courses.
We’ve been using Scala in this role at Trinity University for almost 10 years now, and I have very few complaints.
As I’ve said before in this post, no language is perfect, but I find that the pros for Scala definitely outweigh the cons.
I have some older blog posts (here and here) where I discuss this in more detail as part of my thoughts from the first ~5 years of using Scala.
I’ll just list some of the highlights here.
REPL and Scripting interface for CS1.
Static type checking and lots of syntax errors instead of runtime errors or logic errors.
Type errors prevent a lot of issues.
CS1 students should never see a NullPointerException.
Syntax/semantics allow coverage of things I care about including const/mutability, block scope, OO, subtyping, parametric types, etc.
Access to the full JVM for libraries.
Expressive syntax that combined with libraries that allow me to give interesting assignments that don’t take thousands of lines to code.
Solid APIs that include links to the types for arguments and return values.
In CS2 I can cover multithreading and networking.
I can cover CS1 and CS2 topics without the language forcing me to talk about things I’m not ready to cover yet.
Uniform OO syntax without primitives.
Fairly standard OO model.
Multiparadigm so students have a nice path to C++, Haskell, Java, etc.