A Simple Task on Processing a Single
Sequence: LinqObj4
Creating a Project Template and Getting Acquainted with the Task.
Additional Features of the Taskbook Window Related to
Viewing File Data
The tasks in the LinqObj group are designed to reinforce skills in
applying various methods of the LINQ to Objects interface. Unlike the
tasks in the LinqBegin group, these tasks are not focused on studying
any particular type of query; when performing them, it is required to
independently select the LINQ methods that ensure obtaining the
required result. Another difference from the tasks in the LinqBegin group
is the more complex nature of the initial sequences: their elements
are records consisting of several fields. These
features bring the tasks of the LinqObj group closer to real-world problems
that arise when processing complex data structures.
We will consider a relatively simple task related to
processing a single sequence.
LinqObj4°. The initial sequence contains information about clients of the fitness center. Each element of the sequence includes the following integer fields: <Year> <Month number> <Session length (in hours)> <Client code> For each client present in the source data, determine the total session length for all years (output the total length first, then the client code). The information about each client should be displayed on a new line and ordered by descending total length, and if they are equal — by ascending client code.
The project template for this task, as for any tasks
performed using the Programming Taskbook problem book,
should be created using the PT4Load software module.
After creating this project, automatically launching
the Visual Studio environment and loading the created project into it, the screen
will display the LinqObj4.cs file, into which the solution to the
task needs to be entered. Let's present the contents of this file:
// File: "LinqObj4"
using PT4;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace PT4Tasks
{
public class MyTask : PT
{
// To read strings from the source text file into
// a string sequence (or array) s, use the statement:
// s = File.ReadLines(GetString());
// To write the sequence s of IEnumerable<string> type
// into the resulting text file, use the statement:
// File.WriteAllLines(GetString(), s);
// When solving tasks of the LinqObj group, the following
// additional methods defined in the taskbook are available:
// (*) Show() and Show(cmt) (extension methods) - debug output
// of a sequence, cmt - string comment;
// (*) Show(e => r) and Show(cmt, e => r) (extension methods) -
// debug output of r values, obtained from elements e
// of a sequence, cmt - string comment.
public static void Solve()
{
Task("LinqObj4");
}
}
}
Compared to files created for solving tasks in the LinqBegin group,
template files for tasks in the LinqObj group have the following features:
- the
System.IO namespace import directive has been added to the list of using directives,
which defines classes related to file input-output;
- the
MyTask class contains a comment describing the actions
that need to be performed to input the initial sequences
and output the results.
When performing tasks from the LinqObj group, there is no need to use the
input-output methods GetEnumerableInt, GetEnumerableString
and Put, since both the initial and resulting sequences
are contained in text files, for working with which it is sufficient
to use the facilities of the standard .NET library. Among the
additional input-output methods provided by the taskbook, in
tasks of the LinqObj group, only the methods
intended for inputting integers and strings are required: GetInt and GetString
respectively.
Running the created program template, we will see on the screen the taskbook window
containing the task description, as well as an example of the initial
data and correct results:
As in the introductory run of any task, the window contains three
sections: with the task description, initial data, and an example of
the correct solution. At the beginning of the section with initial data, two
text strings are specified, accompanied by the comments "Source file" and "Resulting
file". The first string defines the name of the text file
containing the initial sequence. This file is automatically
created by the taskbook during task initialization; having received its name,
the student's program will be able to access it and read its data.
The second string defines the name of the text file in which the
resulting string sequence should be contained. The student's
program must fill the specified file with the required data, after
which the taskbook will analyze the contents of this file, comparing it with the correct solution variant.
During each test run, the file names, as well as their
contents, change.
The taskbook window displays not only the names but also the contents of
the files associated with the task. Each line of the text file
is enclosed in quotes and displayed on a separate screen line; next to
the first line of the file, its sequence number, equal to 1, is indicated. In
the initial data and results sections, the file contents are highlighted in
light-cyan color (color highlighting allows distinguishing file
lines from other data, as well as from comments). In the section with the correct
solution, all data is displayed in gray to distinguish them from
the "real" data found by the student's program.
The window variant shown in the previous figure corresponds to the
abbreviated display mode of file data, in which for each
file only the initial part of its contents is displayed (from one to
five lines). The ellipsis (...) placed at the bottom of those sections that display abbreviated data indicates that part
of the data is missing. The abbreviated display mode is convenient for
initial acquaintance with the task, as it allows displaying the
contents of all sections in a relatively small window. For
a more detailed analysis of the data, as well as when comparing the obtained
erroneous results with the example of the correct solution, the
full display mode of the text file contents should be used.
The concluding part of this section will be devoted to describing
various capabilities related to the full display mode.
To switch between the abbreviated and full display modes
of file data, just press the [Ins] key or
double-click the mouse in one of the sections with file
data. You can also click on the square marker that
appears in the upper right corner of the initial data section if the window
contains file data. The image on this marker serves as
an indicator of the mode: the variant with an arrow pointing down
denotes the abbreviated display mode (when hovering the mouse over
the marker in this case, the tooltip "Expand contents of the text file (Ins)"
is displayed); the variant with an arrow pointing up
denotes the full display mode (associated with it is the tooltip
"Collapse contents of the text file (Ins)").
The following figure shows the window view in the full display mode of
text files. In this mode, the sequence number is indicated before
each file line. When closing the window, the current display mode
is remembered and restored during subsequent program runs.
In case the window size is insufficient to display all
data, the window is equipped with a scroll bar, and, moreover, additional
markers are displayed on it (see the figure). Scrolling the
window contents is easiest to perform using the [Home],
[End], [Up], [Down], [PgUp], [PgDn] keys, as well as using the mouse wheel.
The group of markers displayed in the upper left corner of the section with
the task description is intended for quickly displaying
various sections related to the task: clicking on the marker
ensures transition to the beginning of the next section (see the figure), clicking on
the marker — to the beginning of the previous section. Browsing sections
is performed cyclically. The marker allows switching between
sections with results and the example of the correct solution, if the window
contains both these sections. Instead of clicking on the indicated markers,
you can press the corresponding key: [+], [–] or [/].
Finally, note the marker , appearing in the upper right corner of
the section with the task description when the scroll bar is displayed in the window.
This marker (and the associated [Del] key) allows
hiding the section with the description, thereby increasing the window area for
displaying data from other sections. Note that the [Del] key
allows hiding the section with the description even if the marker
is not displayed on the screen.
Performing the Task
Having completed the review of the additional capabilities of the taskbook window and
familiarized ourselves with the LinqObj4 task, let's proceed to perform it.
In this task, as in all tasks of the LinqObj group, the initial
sequence is contained in an external text file. To read
all lines of this file, it is easiest to use the ReadLines method
of the File class, which returns the file contents as a sequence of
IEnumerable<string>:
File.ReadLines(GetString())
The first parameter of this method defines the file name; we obtain
this name using the GetString function. There is also a second, optional parameter,
which defines the file encoding; by default, the UTF-8 encoding is used.
Starting from version 1.3 of the PT for LINQ taskbook, in all tasks of this group, the source files contain only characters
from the ASCII set, whose encoding coincides with UTF-8, so the second parameter does not need to be specified.
During subsequent processing of the initial sequence, we
will need to access individual fields of its elements, so after
reading the lines from the source file, it is necessary to convert them into
a set of fields. For this, it is sufficient to apply the Select method to the obtained string
array, defining for the initial string element an element of an anonymous type with the required fields:
File.ReadLines(GetString())
.Select(e =>
{
string[] s = e.Split(' ');
return new
{
hours = int.Parse(s[2]),
code = int.Parse(s[3])
};
})
We used a lambda expression containing not a return
value, but a set of statements, including the return statement. This is related to the
fact that to define the fields, it is necessary to first perform
splitting of the initial string by the Split method. Note that if you break the
text of the lambda expression into separate lines, the code editor
will automatically format the obtained text, presenting the
result in a form similar to the one above.
A variant of the lambda expression with a return value is also possible,
but it requires calling the Split method twice:
.Select(e => new
{
hours = int.Parse(e.Split(' ')[2]),
code = int.Parse(e.Split(' ')[3])
})
Such a variant is quite acceptable for simple data sets
containing a small number of fields.
Note that we included in the anonymous type only data related
to the session length (field hours) and the client code (field code),
since information about the year and month is not required for performing the LinqObj4 task.
At this stage of solving the task, debug
output of the obtained sequence can be performed by adding the auxiliary method Show
to the chain of called methods. Let's present the text of the Solve function,
corresponding to the current solution stage:
public static void Solve()
{
Task("LinqObj4");
File.ReadLines(GetString())
.Select(e =>
{
string[] s = e.Split(' ');
return new
{
hours = int.Parse(s[2]),
code = int.Parse(s[3])
};
})
.Show();
}
When running the program, a debug section will appear in the taskbook window,
containing information about the obtained sequence. Let's present
the view of this window in the abbreviated file data display mode,
additionally hiding the section with the description. The information
panel contains the message "Some required data are not input",
since in our program the name of the results file has not yet been input.
The debug section demonstrates how data of an anonymous type
are converted to their string representation: this representation
contains a list of pairs field name = field value, enclosed in curly
braces. Comparing the initial string data and the data output in the
debug section, we are convinced that the conversion was performed correctly.
The task requires determining the total session length
for each client; for this, grouping of the
obtained sequence by client codes should be performed. Then it is necessary to
sort the grouped sequence by two
keys: the main one (total session length) — in descending order and
the secondary one (client code) — in ascending order. The sorted
sequence must be converted into a sequence of strings using the projection method Select.
Using the simplest variant of the GroupBy method, we obtain
the following chain of methods, which should continue the previously
built chain:
.GroupBy(e => e.code)
.OrderByDescending(e => e.Sum(c => c.hours))
.ThenBy(e => e.Key)
.Select(e => e.Sum(c => c.hours) + " " + e.Key)
In the provided variant, the sum of the hours fields has to be calculated twice:
during sorting and during subsequent projection. To avoid this,
a variant of the GroupBy method with two
parameters can be used, defining a new anonymous type for the elements of the grouped
sequence:
.GroupBy(e => e.code,
(k, ee) => new {k, sum = ee.Sum(c => c.hours)})
.OrderByDescending(e => e.sum).ThenBy(e => e.k)
.Select(e => e.sum + " " + e.k)
In the expression used in the Select method, there is no need
to perform special actions to convert numerical data into
their string representations, since, according to the rules of the C# language,
such conversion is performed automatically for all operands of the
sum if at least one of these operands is of type string (in our
case, such an operand is the separator space).
It remains to write the obtained string sequence into the
text file with the specified name. If the sequence is preliminarily associated
with the variable r, then to save it in a text
file, it is sufficient to execute the following statement (recall that this
statement is indicated in the comment included in the template for the task):
File.WriteAllLines(GetString(), r);
When executing this statement, first the name of the results file is determined, which is read
from the set of initial data using the GetString function, and then
the elements of the string sequence are written to the file; by default,
the UTF-8 encoding is used.
Combining the obtained fragments, we get the first variant of the
correct solution:
public static void Solve()
{
Task("LinqObj4");
var r = File.ReadLines(GetString())
.Select(e =>
{
string[] s = e.Split(' ');
return new
{
hours = int.Parse(s[2]),
code = int.Parse(s[3])
};
})
.GroupBy(e => e.code,
(k, ee) => new { k, sum = ee.Sum(c => c.hours) })
.OrderByDescending(e => e.sum).ThenBy(e => e.k)
.Select(e => e.sum + " " + e.k);
File.WriteAllLines(GetString(), r);
}
To check the correctness of the obtained solution (as well as solutions
to other tasks in the LinqObj group), it must be tested on nine
different test sets of initial data, among which there will be
sets of relatively large size. As an example,
here is the taskbook window with a message about the successful completion of the task,
in which a set of 68 records was provided for processing:
Concluding the discussion of the LinqObj4 task, let's present a variant of its
solution using query expressions:
public static void Solve()
{
Task("LinqObj4");
var r =
from e in File.ReadLines(GetString())
let s = e.Split(' ')
select new
{
hours = int.Parse(s[2]),
code = int.Parse(s[3])
}
into e
group e.hours by e.code
into e
let sum = e.Sum()
orderby sum descending, e.Key
select sum + " " + e.Key;
File.WriteAllLines(GetString(), r);
}
Let's comment on the features of this variant.
Since in the select construct, only an expression can be specified,
defining the element of the returned sequence, we performed the call to the
Split method in the let construct, saving its result in a
auxiliary variable s and using it further in the select
construct.
The results of executing the select and group constructs (i.e.,
the intermediate sequences) we passed to the following
fragments of the query expression using the query continuation
construct into. During automatic
formatting of the query expression performed by the Visual Studio environment
editor, each fragment starting with the query continuation construct is highlighted with an additional indent. Pay
attention to the fact that when continuing the query, the name
of a previously used enumerator variable can be used, since previously
introduced enumerators are not accessible in the subsequent part of the query expression.
The group construct, unlike the GroupBy method, does not allow explicitly
defining the type of elements of the resulting sequence (as a result
of grouping, a sequence of elements of type
IGrouping<K, E> is always returned), however, there is a possibility to explicitly indicate
which fields from the initial set should be kept in the grouped
sequence. By specifying the field e.hours after the word group, we
ensured the construction of a sequence of elements including the key
(property Key) and a numerical sequence — a set of numbers taken from the
hours fields and corresponding to this key. This allowed, when
calling the Sum method, to do without a clarifying lambda expression. To
avoid a double call to the Sum method, we saved its
result in an auxiliary variable sum, which was defined using
another let construct.
The obtained solution variant has somewhat greater
clarity than the first variant, primarily due to the absence of
lambda expressions.
|