2.1. Java Data Structures and File I/O¶
Continuing from last time’s reading, here is the code example we will be looking at again:
1import java.util.Scanner;
2
3public class TempConv {
4 public static void main(String[] args) {
5 Double fahr;
6 Double cel;
7 Scanner in;
8
9 in = new Scanner(System.in);
10 System.out.println("Enter the temperature in F: ");
11 fahr = in.nextDouble();
12
13 cel = (fahr - 32) * 5.0/9.0;
14 System.out.println("The temperature in C is: " + cel);
15 }
16}
2.1.1. Imports¶
In Java, you can use any class that is available without having to import the class, subject to two very important conditions:
The
javac
andjava
commands must know that the class exists.You must use the full name of the class
Your first question might be how do the java
and javac
commands
know that certain classes exist. The answer is the following:
Java knows about all the classes that are defined in .java and .class files in your current working directory.
Java knows about all the classes that are shipped with Java.
Java knows about all the classes that are included in your
CLASSPATH
environment variable. YourCLASSPATH
environment variable can name two kinds of structures.A .jar file that contains Java classes
Another directory that contains Java class files
You can think of the import statement in Java as working a little bit
like the from module import xxx
statement in Python. However, behind
the scenes, the two statements actually do very different things. The
first important difference to understand is that the class naming system
in Java is very hierarchical. The full name of the Scanner class is
really java.util.Scanner
. You can think of this name as having two
parts: The first part java.util
is called the package and the
last part is the class. We’ll talk more about the class naming system a
bit later. The second important difference is that it is the Java class
loader’s responsibility to load classes into memory, not the import
statement’s.
So, what exactly does the import statement do? What it does is tell the
compiler that we are going to use a shortened version of the class’s
name. In this example we are going to use the class
java.util.Scanner
but we can refer to it as just Scanner
. We
could use the java.util.Scanner
class without any problem and
without any import statement, provided that we always referred to it by
its full name. As an experiment, you may want to try this yourself.
Remove the import statement and change the string Scanner to
java.util.Scanner
in the rest of the code. The program should still
compile and run.
2.1.2. Input / Output / Scanner¶
In the previous section we created a Scanner
object. In
Java, Scanner
objects make getting input from the user, a file, or even
over the network relatively easy. In our case we simply want to ask the
user to type in a number at the command line, so in line 9 we construct
a Scanner
by calling the constructor and passing it the System.in
object. Notice that this Scanner
object is assigned to the name in
,
which we declared to be a Scanner
on line 7. System.in
is
similar to System.out
except, of course, it is used for input. If you
are wondering why we must create a Scanner
to read data from
System.in
when we can write data directly to System.out
using
println
, you are not alone. We will talk about the reasons why this
is so later when we talk in-depth about Java streams. You will also see
in other examples that we can create a Scanner
by passing the Scanner
a
File
object. You can think of a Scanner
as a kind of “adapter” that
makes low level objects easier to use.
On line 11 we use the Scanner
object to read in a number. Here again we
see the implications of Java being a strongly typed language. Notice
that we must call the method nextDouble
because the variable
fahr
was declared as a double. So, we must have a function that is
guaranteed to return each kind of object we might want to read. In this
case, we need to read a Double so we call the function nextDouble
. The
compiler matches up these assignment statments and if you try to assign
the results of a method call to the wrong kind of variable it will be
flagged as an error.
The table below shows some commonly used methods of the Scanner
class. There
are many more methods supported by this class and we will talk about how
to find them in our chapter about Java Documentation.
Return type |
Method name |
Description |
---|---|---|
boolean |
hasNext() |
returns true if more data is present |
boolean |
hasNextInt() |
returns true if the next thing to read is an integer |
boolean |
hasNextFloat() |
returns true if the next thing to read is a float |
boolean |
hasNextDouble() |
returns true if the next thing to read is a double |
Integer |
nextInt() |
returns the next thing to read as an integer |
Float |
nextFloat() |
returns the next thing to read as a float |
Double |
nextDouble() |
returns the next thing to read as a Double |
String |
next() |
returns the next thing to read as a String |
2.1.3. List¶
Next, let’s look at a program which reads numbers from a file and produces a histogram showing the frequency of the numbers. The data file we will use has one number between 0 and 9 on each line of the file. Here is a simple Python program that creates and prints a histogram.
1def main():
2 count = [0]*10
3 data = open('test.dat')
4
5 for line in data:
6 count[int(line)] = count[int(line)] + 1
7
8 idx = 0
9 for num in count:
10 print(idx, " occured ", num, " times.")
11 idx += 1
12
13main()
Test running the program. It will read this data:
test.dat
1 2 3 9 1
Lets review what is happening in this little program. First, we create a list
and initialize the first 10 positions in the list to be
0. Next we open the data file called test.dat
. Third, we have a loop
that reads each line of the file. As we read each line we convert it to
an integer and increment the counter at the position in the list
indicated by the number on the line we just read. Finally we iterate
over each element in the list, printing out both the position in the list
and the total value stored in that position.
To write the Java version of this program we will have to introduce
several new Java concepts. First, you will see the Java equivalent of a
list, called an ArrayList.
Next, you will see three different kinds
of loops used in Java. Two of the loops we will use are going to be very
familiar, the third one is different from what you are used to in Python
but is easy when you understand the syntax:
while (condition) { code }
The
code
will be repeatedly executed until thecondition
becomes false.
for (initialization statement; condition; loop statement) { code }
The
code
will be repeatedly executed until thecondition
becomes false. As shown in the example below, theinitialization statement
andloop statement
make this form useful for iterating over a range of numbers, similar to how you might usefor i in range(10)
in Python.
for (Type variable : collection) { code }
The
code
will be executed once for each element in thecollection
. Each execution,variable
will be assigned to the next element ofcollection
. Known as the “for-each” loop. This form is useful for iterating over members of a collection, similar to how you might usefor a in array
in Python.
Note
For the first lectures as we get used to Java, we’ll focus on the while
loop and standard for
loop. We’ll touch on the for-each loop more later in the semester.
Here is the Java code needed to write the exact same program:
1import java.util.Scanner;
2import java.util.ArrayList;
3import java.io.File;
4import java.io.IOException;
5
6public class Histo {
7
8 public static void main(String[] args) {
9 Scanner data = null;
10 ArrayList<Integer> count;
11 Integer idx;
12
13 try {
14 data = new Scanner(new File("test.dat"));
15 }
16 catch ( IOException e) {
17 System.out.println("Unable to open data file");
18 e.printStackTrace();
19 System.exit(0);
20 }
21
22 count = new ArrayList<Integer>(10);
23 for (Integer i = 0; i < 10; i++) {
24 count.add(i,0);
25 }
26
27 while(data.hasNextInt()) {
28 idx = data.nextInt();
29 count.set(idx,count.get(idx)+1);
30 }
31
32 idx = 0;
33 for(Integer i : count) {
34 System.out.println(idx + " occured " + i + " times.");
35 idx++;
36 }
37 }
38}
Before going any further, I suggest you try to compile the above program and run it on some test data that you create.
Now, let’s look at what is happening in the Java source. As usual, we
declare the variables we are going to use at the beginning of the
method. In this example we are declaring a Scanner
variable called data
,
an integer called idx
and an ArrayList
called count
. However, there
is a new twist to the ArrayList
declaration. Unlike Python where
lists can contain just about anything, in Java we let the compiler know
what kind of objects our array list is going to contain. In this case
the ArrayList
will contain Integers
. The syntax we use to declare
what kind of object the list will contain is the <Type>
syntax.
Technically, you don’t have to declare what is going to be in an array
list. The compiler will allow you to leave the <``*Type*
>`` off the
declaration. If you don’t tell Java what kind of object is going to be
on the list Java will give you a warning message like this:
Note: Histo.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Without the <Integer>
part of the declaration Java simply assumes that
any object can be on the list. However, without resorting to an ugly
notation called casting, you cannot do anything with the objects on a
list like this! So, if you forget you will surely see more errors later
in your code. (Try it and see what you get)
2.1.4. Exception Handling with try/catch¶
Lines 13—20 are required to open the file. Why so many lines to open a file in Java? The additional code mainly comes from the fact that Java forces you to reckon with the possibility that the file you want to open is not going to be there. If you attempt to open a file that is not there you will get an error. A try/catch construct allows us to try things that are risky, and gracefully recover from an error if one occurs. The following example shows the general structure of a try/catch block.
1try {
2 // Put some risky code in here, like opening a file
3}
4catch (Exception e) {
5 // If an error happens in the try block an exception is thrown.
6 // We will catch that exception here!
7}
Notice that in line 16 we are catching an IOException
. In fact, we
will see later that we can have multiple catch blocks to catch different
types of exceptions. If we want to be lazy and catch any old exception
we can catch an Exception
which is the parent of all exceptions.
However, catching Exception
is a terrible practice, since you may inadvertently catch exceptions you do not intend to, making it harder to identify bugs in your program.
On line 22 we create our ArrayList
and give it an initial size of 10.
Strictly speaking, it is not necessary to give the ArrayList
any
size. It will grow or shrink dynamically as needed, just like a list in
Python. On line 23 we start the first of three loops. The for loop on
lines 23–25 serves the same purpose as the Python statement
count = [0]*10
, that is it initializes the first 10 positions in the
ArrayList
to hold the value 0.
The syntax of this for loop probably looks very strange to you, but in
fact it is not too different from what happens in Python using range. In
fact for (Integer i = 0; i < 10; i++)
is exactly equivalent to the
Python for i in range(10)
The first statement inside the parenthesis
declares and initializes a loop variable i
. The second statement is a
Boolean expression that is our exit condition. In other words we will
keep looping as long as this expression evaluates to true. The third
clause is used to increment the value of the loop variable at the end of
iteration through the loop. In fact i++
is Java shorthand for
i = i + 1
Java also supports the shorthand i--
to decrement the
value of i. Like Python, you can also write i += 2
as shorthand for
i = i + 2
Try to rewrite the following Python for loops as Java for
loops:
for i in range(2,101,2)
for i in range(1,100)
for i in range(100,0,-1)
for x,y in zip(range(10),range(0,20,2))
[hint, you can separate statements in the same clause with a ,]
The next loop (lines 27–30) shows a typical Java pattern for reading
data from a file. Java while loops and Python while loops are identical
in their logic. In this case, we will continue to process the body of the
loop as long as data.hasNextInt()
returns true.
Line 29 illustrates another important difference between Python and
Java. Notice that in Java we can not write
count[idx] = count[idx] + 1
. This is because in Java there is no
overloading of operators. Everything except the most basic math and
logical operations is done using methods. So, to set the value of an
ArrayList
element we use the set
method. The first parameter of
set
indicates the index or position in the ArrayList
we are
going to change. The next parameter is the value we want to set. Notice
that, once again, we cannot use the indexing square bracket operator to
retrieve a value from the list, but we must use the get
method.
The last loop in this example is similar to the Python for loop where
the object of the loop is a Sequence. In Java we can use this kind of
for loop over all kinds of sequences, which are called Collection
classes in Java. The for loop on line 33 for(Integer i : count)
is
equivalent to the Python loop for i in count:
This loop iterates
over all of the elements in the ArrayList called count. Each time
through the loop the Integer variable i
is bound to the next element of
the ArrayList
. If you tried the experiment of removing the
<Integer>
part of the ArrayList
declaration you probably noticed
that you had an error on this line. Why?
2.1.5. Arrays¶
As I said at the outset of this section, we are going to use Java
ArrayLists
because they are easier to use and more closely match the
way that Python lists behave. However, if you look at Java code on the
internet or even in your Core Java books you are going to see examples
of something called arrays. In fact you have already seen one example of
an array declared in the ‘Hello World’ program. Lets rewrite this
program to use primitive arrays rather than array lists.
Note
This section moves a little quickly through arrays. In the course we will be practicing more with arrays in the assignments before moving on to
ArrayLists
.
1import java.util.Scanner;
2import java.io.File;
3import java.io.IOException;
4
5public class HistoArray {
6 public static void main(String[] args) {
7 Scanner data = null;
8 Integer[] count = {0,0,0,0,0,0,0,0,0,0};
9 Integer idx;
10
11 try {
12 data = new Scanner(new File("test.dat"));
13 }
14 catch ( IOException e) {
15 System.out.println("Unable to open data file");
16 e.printStackTrace();
17 System.exit(0);
18 }
19
20 while(data.hasNextInt()) {
21 idx = data.nextInt();
22 count[idx] = count[idx] + 1;
23 }
24
25 idx = 0;
26 for(Integer i : count) {
27 System.out.println(idx + " occured " + i + " times.");
28 idx++;
29 }
30 }
31}
The main difference between this example and the previous example is
that we declare count
to be an Array
of integers. We also can initialize
short arrays directly using the syntax shown on line 8. Then notice that
on line 22 we can use the square bracket notation to index into an
array.
2.1.6. Dictionary¶
Just as Python provides the dictionary when we want to have easy access
to key-value pairs, Java also provides us a similar mechanism. Rather
than the dictionary terminology, Java calls these objects Maps. Java
provides two different implementations of a map, one is called the
TreeMap
and the other is called a HashMap
. As you might guess
the TreeMap
uses a balanced binary tree behind the scenes, and the
HashMap
uses a hash table.
Note
We will cover the details of maps, binary trees, and hash tables later in the semester, so you don’t have to worry about the details of them right now – just know that Java Maps are similar to the functionality of Python dictionaries.
Lets stay with a simple frequency counting example, only this time we will count the frequency of words in a document. A simple Python program for this job could look like this:
1def main():
2 data = open('alice30.txt')
3 wordList = data.read().split()
4 count = {}
5 for w in wordList:
6 w = w.lower()
7 count[w] = count.get(w,0) + 1
8
9 keyList = sorted(count.keys())
10 for k in keyList:
11 print("%-20s occurred %4d times" % (k, count[k]))
12
13main()
alice30.txt
Down, down, down. Would the fall NEVER come to an end! 'I wonder how many miles I've fallen by this time?' she said aloud. 'I must be getting somewhere near the centre of the earth. Let me see: that would be four thousand miles down, I think--' (for, you see, Alice had learnt several things of this sort in her lessons in the schoolroom, and though this was not a VERY good opportunity for showing off her knowledge, as there was no one to listen to her, still it was good practice to say it over) '--yes, that's about the right distance--but then I wonder what Latitude or Longitude I've got to?' (Alice had no idea what Latitude was, or Longitude either, but thought they were nice grand words to say.)
Note
If you want to try out this program yourself, copy the above text into a file
called alice30.txt
and save it to same folder as the program.
Notice that the structure of the program is very similar to the numeric histogram program.
1import java.util.Scanner;
2import java.util.ArrayList;
3import java.io.File;
4import java.io.IOException;
5import java.util.TreeMap;
6
7public class HistoMap {
8
9 public static void main(String[] args) {
10 Scanner data = null;
11 TreeMap<String,Integer> count;
12 Integer idx;
13 String word;
14 Integer wordCount;
15
16 try {
17 data = new Scanner(new File("alice30.txt"));
18 }
19 catch ( IOException e) {
20 System.out.println("Unable to open data file");
21 e.printStackTrace();
22 System.exit(0);
23 }
24
25 count = new TreeMap<String,Integer>();
26
27 while(data.hasNext()) {
28 word = data.next().toLowerCase();
29 wordCount = count.get(word);
30 if (wordCount == null) {
31 wordCount = 0;
32 }
33 count.put(word,++wordCount);
34 }
35
36 for(String i : count.keySet()) {
37 System.out.printf("%-20s occured %5d times\n", i, count.get(i) );
38 }
39 }
40}