NLREG includes a full programming language that you can use to describe the
function that is to be fitted to the data values. Within the language you
can call many built-in library functions.

The NLREG language is
modeled after the C programming language and shares the same syntax and
many of the same language statements. Here is an example of a complete
NLREG program:

Title "Piecewise linear function";
Variables X,Y;
Parameter B0;
Parameter B1;
Parameter B2;
Constant Pivot=5; // X value at which bend occurs
if (x < Pivot) {
Function Y = B0+B1*(X-Pivot);
} else {
Function Y = B0+B2*(X-Pivot);
}
Plot residual;
Rplot grid;
Data;
{ data goes here }

### Program Comments

The beginning of a comment is denoted with
"//" (two consecutive slash characters). Everything from the "//"
sequence to the end of the line is treated as a comment. Comments may be
on lines by themselves or on the ends of other statements. You can also
specify a comment by beginning the comment with the "/*" character
sequence. All characters following this are treated as comments up to the
matching "*/" sequence. The following lines illustrate both types of
comments:

// Function to be fitted
y = a + b*x; // Simple linear equation
/*
* This is a comment.
*/
z = y / 5; /* This is a comment too */

### Required Statements

Every program file must contain the following statements: **variables**,
**parameters**, **function**, and **data**. The **data**
statement introduces the data for the analysis and must be the last
statement in the file (data records may follow it). Other, optional,
statements may be interspersed in the program file.

### Title Statement

**Title** "*string*"; (optional) -- Specifies a title line that is
printed with the results of the analysis. Note: the title string must be
enclosed in quote signs.

### Variables Statement

**Variables** *var1,var2,...*; (required) -- Specifies the names of
the input variables whose values will be read from your data file. The
order of the variable names must match the order of the data values on each
observation record. You may define more variables than you actually use in
the function specification. A maximum of 500 variables may be specified.
The length of a variable name is limited to 10 characters. Capitalize the
variable names as you want them displayed in the results. The keyword
"**variable**" may be used instead of "**variables**".

You may
specify all of the variables on a single statement or you may use multiple
**variables** statements. If you use multiple statements, the order in
which they appear in the program file must match the order of the variable
values on each observation record. The **variables** statement must
precede the **function** statement.

You can also use the
**double** statement to declare variables (see below). The difference is
that the **variables** statement declares variables that are read from
the input file whereas the **double** statement declares variables whose
values will be computed by statements in your program file.

### Parameters Statement

**Parameters**
*param1*[=*initial1*],*param2*[=*initial2*],...;
(required) -- Specifies the names of the parameters whose values are to be
determined by NLREG. NLREG is capable of handling up to 500 parameters. The
parameter names may not exceed 10 characters in length. Do not specify any
parameters that are not used in the analysis. The **parameters**
statement must precede the **function** statement. The keyword
"**parameter**" may be used instead of "**parameters**".

Optionally, an initial estimate of the parameter value may be specified by
following the parameter name with an equal sign and the value. If no value
is specified, 1 is used by default. Specifying an initial value that is
near the actual value usually speeds up the operation of NLREG and may
enable it to successfully converge to a solution. If NLREG is unable to
converge to a solution, try specifying different starting values for the
parameters.

The **constrain** statement (see below) can be used to limit the range
of values for parameters. The **sweep** statement can be used to
perform the regression analysis with a range of parameter initial values.

### Double Statement

**Double** *var1*[=*value*],*var2*[=*value*],...;
(optional) -- Specifies the names of computed variables that you may use
subsequently to hold calculated values. NLREG allows you to define up to
500 computed variables. All variables hold double precision (64 bit)
floating point values. Optionally, the name of a variable may be followed
by an equal sign and a value to which the variable is initialized. If you
do not specify an initial value, the variable is initialized to 0. The
following are examples of **double** statements:

double t1,t2;
double roomtemp=73;

It is convenient to use computed variables for intermediate calculations
such as transformed values of input variables.

### Array Declarations

NLREG allows you to declare arrays with one or two dimensions. To do this,
follow the name of the variable with number of array elements enclosed in
square brackets. If the array has two dimensions specify the number of
rows, then the number of columns separated by a comma. (Note: this is
different than the C language syntax for declaring a two-dimensional
array). The following statements declare a one dimensional array (i.e., a
vector) with 20 elements and a two dimensional array with 5 rows and 10
columns:

double xvec[20];
double ya[5,10];

You may assign initial values to arrays by following the variable
declaration with an equal sign and a list of values enclosed in curly
braces. In the case of a two-dimensional array, the values should be
specified by rows (i.e., the last subscript varies most rapidly). The
following are examples of array declarations with initializations:
double xvec[5] = {2,5,7,1,0};
double xa[2,3] = {2.3,7.5,1.2,4.4,2.6,7.3};

### Array Subscripts

When used in expressions the subscript values are 0 based. That is, the
first element of the array is referenced using a subscript value of 0 and
the last element is referenced using a subscript value equal to one less
than the number of elements in the array. For example, the following
statements would declare an array with 100 elements and initialize it:

double xsq[100],i;
for (i=0; i<100; i++) {
xsq[i] = i;
}

### Constant Statement

**Constant** *variable*=*value*; (optional) -- Specifies the
name of a symbolic constant and associates a numeric value. You can then
use the symbolic name where you would use a number and the corresponding
constant numeric value will be substituted. This is useful when you are
trying out different models and want to easily be able to change a constant
value for each run. The following is an example of a symbolic constant
named "Roomtemp" that causes the value 73 to be substituted in the
function:

Variable Time; // Cooling time in seconds
Variable Temp; // Temperature of object
Constant Roomtemp = 73; // Ambient temperature
Parameter InitTemp; //Initial temperature
Parameter Coolrate; // Cooling rate factor
Function Temp = Roomtemp + InitTemp * exp(-Coolrate * Time);

### Constrain Statement

**Constrain** *parameter*=*lowvalue,highvalue*; (optional) --
Specifies a lower and upper limit on the range of a parameter value.
During the solution process, NLREG may allow a parameter''s value to
temporarily move in a direction away from its final value. With some
functions it may be necessary to constrain the parameter''s value so that it
does not go negative (e.g., if the function takes the square root of the
parameter), or zero (if the parameter is in a denominator).

Only a single parameter and its associated limits may be specified on each
**constrain** statement, but you may use multiple **constrain**
statements. The **parameters** statement must precede the
**constrain** statement. Use the **constant** statement if you wish
to define a parameter with a fixed value.

The parameter value is allowed to range from *lowvalue* to
*highvalue*. If you want to prevent a parameter value from going to
zero, you must specify a value greater than zero for the low value
(specifying zero would allow it to reach, but not go below, zero). For
example, the following statement constrains the value of age to be greater
than zero and less than or equal to 100:

constrain age = .0001,100;

### Sweep Statement

**Sweep** *parameter*=*lowvalue,highvalue,stepsize*;
(optional) -- Specifies that the regression analysis is to be performed
repeatedly with a set of starting values for the parameter. The first
analysis is performed with the parameter having the *lowvalue*; the
value of *stepsize* is then added to the parameter''s initial value and
the analysis is performed again. The process is repeated until the value
of the parameter reaches *highvalue*.

Each time the analysis is performed the value of the residual sum of
squares is compared with the best previous result. The estimated values of
the parameters for the best starting value are saved and used for the final
analysis and report.

Only one parameter may be specified on each **sweep** statement, but you
may have as many **sweep** statements as there are parameters. The
number of regression analyses performed will be equal to the product of the
number of parameter values for each **sweep** statement.

The **sweep** statement is useful when you are trying to fit a
complicated function that may have "local minimum" values other than the
"global minimum". Periodic functions (sin, cos, etc.) are especially
troublesome.

### Function Statement

**Function** *depvar* = *function*; (required) -- Specifies
the form of the function whose parameters are to be determined. The
dependent variable must be the only thing to the left of the equal sign.
The expression to the right of the equal sign may contain variables,
parameters, constants, operators, and library functions such as sqrt, sin,
exp, etc. The **variables** and **parameters** statements must
appear in the program file before the **function** statement. The
function may be specified using parameters, input variables, computed
variables (declared using the **double** statement), constants, and
library functions. You may use more than one **function** statement if
you use **if** or other conditional statements to select which one will
be executed. However, during each execution of your program file one, and
only one, **function** statement must be executed. Some example
**function** statements are show below:

Function y = p0 + p1*x;
Function distance = .5 * accel * time^2;
Function value = price + yrdep*age + miledep*miles;
Function populatn = base * growrate * exp(time);

### Correlate Statement

**Correlate** [*var1,var2,...*]; (optional) -- Causes NLREG to
compute and print a correlation matrix. If you do not specify a list of
variables, the correlation matrix includes all input variables. If you
wish to control exactly which variables are included in the matrix, or if
you wish to include computed variables (declared with a **double**
statement), you may specify a list of variables.

### Covariance Statement

**Covariance;** (optional) -- Causes the variance-covariance matrix for
the parameters to be printed.

### Confidence Statement

**Confidence** [*percent*]; (optional) -- Specifies that a
confidence interval is to be printed for each estimated parameter. The
purpose of regression analysis is to determine the best estimate of
parameter values. However, as with most statistical calculations, the
values determined are estimates of the true values. The **confidence**
statement causes NLREG to print a table showing the range of possible
values for each parameter given a specified confidence value. The
*percent* parameter specifies the probability that that the actual
value of the parameter is within the confidence interval to be computed.
For example, the statement

Confidence 95;

specifies that the confidence interval(s) are to be computed such that
there is a 95 percent probability that the actual values of the parameters
are within the intervals (or that there is a 5 percent chance that the
parameters are outside the intervals). The "percent" parameter may range
from 50 to 99.999. If the **confidence** statement is used without
specifying a percent value, 90 is used by default.

### Tolerance Statement

**Tolerance** *value*; (optional, default=1E-10) -- Specifies the
tolerance factor that is used to determine when the algorithm has converged
to a solution. Reducing the tolerance value may produce a slightly more
accurate result but will increase the number of iterations and the running
time. The tolerance value must be in the range 1E-15 to 1E-1.

### Iterations Statement

**Iterations** *value*; (optional, default=50) -- Specifies the
maximum number of iterations that should be attempted by the algorithm. If
the solution does not converge to the limit specified by the
**tolerance** statement (or to the default tolerance) before the maximum
number of iterations is reached, the process is stopped and the results are
printed. Failure to converge before the specified number of iterations
could be caused by one of three things:

- The maximum allowed number of iterations might be too small. Try using
an
**iterations** statement with a larger value.

- The tolerance factor may be too small. Even a properly converging
solution will eventually "level off" or oscillate around a good, but
non-zero, sum of squares value. Try using the
**tolerance** statement
to increase the tolerance value.

- The function may not be converging. Try specifying better (or at least
different) starting values for the parameters on the
**parameters**
statement. Consider using the **sweep** statement to specify a range of
parameter starting values.

### Output Statement

**Output** [**to** "*file*"] *var1,var2,...*; (optional) --
Specifies that after the analysis is completed, data values are to be
written to a file. One record is written for each data observation in the
input file. If the "**to** *file*" portion of the statement is
specified, the output is written to the specified file. If this portion of
the statement is omitted, the output values are written to the listing file
along with the results of the analysis. If a file name is specified
without an extension, ".OUT" is used by default.

The list of variable names determines which variables are written to the
file and the order in which the values appear in each output record. Any
variable previously declared with a **variables** or **double**
statement may be specified. In addition, the following system variable
names may appear in the output list:

**obs** -- The observation record number, starting at 1 and
increasing by 1.

**predicted** -- The predicted value for the dependent variable for
the observation, given the independent variable values and the parameters
as calculated by the analysis.
**residual** -- The difference between the actual value of the
dependent variable and its predicted value.

### Poutput Statement

**Poutput** "*file*"; (optional) -- The **poutput** statement
specifies that NLREG is to write the final estimated values of the
parameters to a file. Each parameter value is written to a separate line of
the file. This statement is useful to create a file of estimated parameter
values to be fed into another analysis program. This statement can also be
used to determine the parameter estimates to more significant digits than
displayed in the printed listing because the format used by the
**poutput** statement writes the values with 18 significant digits.

### Plot Statement

**Plot** [*options*]; (optional) -- Display a plot of the
calculated function and the data observations. Each data point is
displayed with a blue '' X''; the function that NLREG fits to the data is
superimposed as a yellow curve.

The **plot** statement can only be used if the **function**
declaration meets the following requirements: (1) there is must only a
single independent variable; (2) the independent variable must be an input
variable (i.e., declared with a **variables** statement not a
**declare** statement). You may use symbolic constants declared with
the **constant** statement. If the function does not meet these
requirements you may produce different types of plots using the
**splot**, **rplot** and **nplot** statements.

The following options may be specified on the PLOT statement:

**nogrid** -- suppress the grid lines that are normally displayed
with the plot.
**title**="*string*" -- specify a title to be displayed with
the plot. If no title is specified the title defined by the TITLE statement
is used.
**notitle** -- suppresses the title for the plot that, by default,
is the title specified with the TITLE statement.
**xlabel**="*string*" -- specify a label to be printed along
the X axis. If you do not use this qualifier, the name of variable whose
values determine the X coordinates is used as the default label.
**noxlabel** -- suppress printing any label along the X axis.
**ylabel**="*string*" -- specify a label to be printed along
the Y axis. If you do not use this qualifier, the name of variable whose
values determine the Y coordinates is used as the default label.
**noylabel** -- suppress printing any label along the Y axis.
**domain**=*lowvalue,hivalue* -- specifies the domain over
which the plot is to be generated. If no domain is specified, NLREG uses
the range of the independent variable for the domain.
**residual** -- draw vertical lines from each observed data point to
the corresponding point on the calculated function line. These lines
represent the "residual" value that NLREG is attempting to minimize. See
also the descriptions of the RPLOT and NPLOT statements.

If more than one option is specified, separate them with commas. For example, to
produce a plot with X and Y axis labels use a statement with the following form:
PLOT XLABEL="Time",YLABEL="Blood concentration";

### Splot Statement

**Splot** [*options*]; (optional) -- Display a scatter plot of
(X,Y) data points. Using the **xvar** and **yvar** options (see
below) you can specify which variable is used for the vertical (Y)
dimension and which is used for the horizontal (X) dimension. Any type of
variable may be specified including input variables, computed variables
(declared with the **double** statement), the dependent variable of the
function, and the system variables **predicted**, **residual**, and
**obs**.

You may display two scatter plots on the same image. This is useful for
comparing computed values with input values. To do this use the
**xvar2** and **yvar2** options to specify the variables for the X
and Y dimensions for the second plot. Each data point for the primary plot
(specified by **xvar** and **yvar**) is marked with a blue ''X''. The
data points for the second plot (specified by **xvar2** and
**yvar2**) are marked with yellow triangles. You can use the
**connect** and **connect2** options to draw straight-line segments
through the points. The **nomark** and **nomark2** options may be
used to suppress the data point markers.

The following options may be specified on the SPLOT statement:

**xvar**=*variable* -- specify the variable to be used for the
horizontal (X) dimension of the first set of plotted points. This can be
any type of variable, input or computed. If you do not specify this option
and there is only a single independent variable in the function, it is used
by default.
**yvar**=*variable* -- specify the variable to be used for the
vertical (Y) dimension. This can be any type of variable, input or
computed. If you do not specify this option then the dependent variable of
the function (i.e., the one on the left of the equal sign) is used by
default.
**xvar2**=*variable* -- specify the variable to be used for the
horizontal (X) dimension of the second set of plotted points. This can be
any type of variable. If you specify **yvar2** but not **xvar2**,
the default is the same variable as specified by **xvar**.
**connect** -- Connect the first set of points by straight-line
segments. The points are displayed and connected in the same order that
they appear in the data file.
**connect2** -- Connect the second set of points by straight line
segments.
**nomark** -- Suppress the display of the ''X'' symbols that normally
mark the first set of data points. This can be used with **connect** to
cause only the line to be drawn.
**nomark2** -- Suppress the display of the triangle symbols that
normally mark the second set of data points.
**nogrid** -- suppress the grid lines that are normally displayed
with the plot.
**title**="*string*" -- specify a title to be displayed with
the plot. If no title is specified the title defined by the TITLE statement
is used.
**notitle** -- suppresses the title for the plot that, by default,
is the title specified with the **title** statement.
**xlabel**="*string*" -- specify a label to be printed along
the X axis. If you do not use this qualifier, the name of variable whose
values determine the X coordinates is used as the default label.
**noxlabel** -- suppress printing any label along the X axis.
**ylabel**="*string*" -- specify a label to be printed along the
Y axis. If you do not use this qualifier, the name of variable whose values
determine the Y coordinates is used as the default label.
**noylabel** -- suppress printing any label along the Y axis.
**domain**=*lowvalue,hivalue* -- specifies the domain over
which the plot is to be generated. If no domain is specified, NLREG uses
the range of the horizontal variable(s) for the domain.

If there is more than one option, separate them with commas. The following is an
example **splot** statement:
splot xvar=time,yvar=sodium,yvar2=potassium,connect,connect2,
title="Blood concentration over time",
xlabel="Time (hours)",ylabel="Sodium & Potassium";

### Rplot Statement

**Rplot** [*options*]; (optional) -- Display a plot of the
residual values. A "residual value" (or error deviation) is the difference
between an actual value of the dependent variable for an observation and
the predicted value based on the function fitted by the regression
analysis. If the calculated function exactly predicted the actual
observation values, all of the residual values would be zero. However,
this is usually not the case and the residual values show where, and by how
much, the fitted function fails to predict the actual observations.

The **rplot** statement causes NLREG to display a plot showing the
residual values on the vertical (Y) axis. The variable plotted along the
horizontal (X) axis may be specified using the XVAR option (see below).
You may specify any variable including the dependent variable and computed
variables declared with the DOUBLE statement. If you do not specify a
variable and there is a single independent variable in the function it is
used. The X axis label indicates which variable was used.

A residual plot is very useful for determining if the form of the function
being fitted is appropriate for the data values. If the residual values
are randomly distributed in positive and negative directions then the form
(shape) of the fitted function is probably appropriate for the data and the
deviations are due to random measurement errors. If, however, the
residuals show a systematic pattern such as a periodic cycle, then the
function may not be appropriate for the data values.

The following options may be specified on the **rplot** statement:

**xvar**=*variable* -- specify which variable is to be used for
the horizontal (X) dimension of the plot. You may specify any variable
including independent input variables, the dependent variable of the
function (i.e., the one on the left of the equal sign), and computed or
transformed variables declared with the **double** statement. If there
is only a single independent variable NLREG will use it by default. The
label along the X axis indicates which variable was used.
**nogrid** -- suppress the grid lines that are normally displayed
with the plot.
**title**="*string*" -- specify a title to be displayed with
the plot. If this option is not specified, the default title is "Plot of
residuals".
**notitle** -- suppresses the title for the plot which, by default,
is "Plot of residuals".
**xlabel**="*string*" -- specify a label to be printed along
the X axis. If you do not use this qualifier, the name of variable whose
values determine the X coordinates is used as the default label.
**noxlabel** -- suppress printing any label along the X axis.
**ylabel**="*string*" -- specify a label to be printed along
the Y axis. If you do not use this qualifier, the default label is
"Residual".
**noylabel** -- suppress printing any label along the Y axis.
**domain**=*lowvalue,hivalue* -- specifies the domain over
which the plot is to be generated. If no domain is specified, NLREG uses
the range of the X dimension variable.

If more than one option is specified, separate them with commas.

### Nplot Statement

**Nplot** [*options*] (optional) -- Display a normal probability
plot of the residual values. In this plot, the actual value of each
residual is plotted on the vertical (Y) axis and the expected value of the
residual, assuming the residuals are normally distributed, is plotted on
the horizontal (X) axis. If the residuals are normally distributed, the
resulting plot will be a straight line passing through the origin with a
slope of 1 (i.e., the actual value of each residual should equal the
expected value from the normal distribution). If the residuals are not
normally distributed, the plot will deviate from a straight line. NLREG
displays a red line along which the X marks should be displayed if the
residuals are normally distributed.

This plot also computes the correlation between the actual residual values
and their expected values and displays the correlation coefficient in the
title line "(r=*n.nnn*)". If the residual values are normally
distributed, the correlation should be close to 1.000. A correlation value
less than 0.940 suggests that the residuals are not normally distributed.

The following options may be specified on the NPLOT statement:

**grid** -- display grid lines to make it easier to estimate values.
**title**="*string*" -- specify a title to be displayed with
the plot. If no title is specified the default title is "Normal probability
plot".
**notitle** -- suppresses the title for the plot.
**xlabel**="*string*" -- specify a label to be printed along
the X axis. If you do not use this qualifier, default label is "Expected
residuals".
**noxlabel** -- suppress printing any label along the X axis.
**ylabel**="*string*" -- specify a label to be printed along
the Y axis. If you do not use this qualifier, the default label is "Actual
residuals".
**noylabel** -- suppress printing any label along the Y axis.

If more than one option is specified, separate them with commas.

### Assignment Statement

The assignment statement is an executable statement that evaluates an
expression and assigns its value to a variable. The syntax for an
assignment statement is:

variable = expression; // Assign expression to variable
variable += expression; // Add expression to variable
variable -= expression; // Subtract expression from variable
variable *= expression; // Multiply variable by expression
variable /= expression; // Divide variable by expression

where *variable* is a variable that was previously declared using a
**double** statement. The variable may be subscripted if it is an
array. *expression* is a valid arithmetic or logical expression
following the rules explained earlier. If the expression involves a
relational comparison operator (e.g., <, >, >=, etc.) or a logical
operation (&&, ||, !), the value 1 is used for true and 0 for false. The
expression may contain any type of variable (input, computed, or constant)
along with parameters and library functions.

### If Statement

The syntax of the **if** statement is:

**if** (*expression*) *statement1* [**else**
*statement2*]

If the *expression* is true (not zero) *statement1* is executed,
if the expression is false (0) and the **else** clause is specified,
*statement2* is executed. The **else** clause and the second set
of controlled statements are optional. You may control groups of
statements by enclosing them in braces. The following are examples of
valid **if** statements:

if (x > bigx) bigx = x;
if (x < Pivot) {
Function Y = B0+B1*(X-Pivot);
} else {
Function Y = B0+B2*(X-Pivot);
}

### While Statement

The **while** statement loops until the controlling expression becomes
false (0) or a **break** statement is executed within the loop. The form
of the **while** statement is:

**while** (*expression*) {

*controlled statements*

}

Here is an example of a **while** statement:

while (x < 5) {
x = x + xmove;
y = y + ymove;
}

### Break and Continue Statements

Each time around a **for**, **while** or **do** loop the
*expression* is evaluated. If it is true (non-zero) the controlled
statements are executed and then the process repeats until the expression
becomes false. If a **break** statement is executed within the loop,
execution of the loop terminates and control is transferred to the first
statement beyond the end of the loop. If a **continue** statement is
executed in the loop, control is transferred to the conditional test at the
top of the loop.

### Do Statement

The **do** statement is very similar to the **while** statement
except the control expression is evaluated at the end of the loop rather
than the beginning. This causes the loop always to be executed at least
once. The form of the **do** statement is:

**do** {

*controlled statements*

} **while** (*expression*);

Here is an example of a **do** statement:

do {
x += xstep;
y += ystep;
} while (x < limit);

### For Statement

The **for** statement is a looping control statement similar to the
**while** statement; however, the **for** statement also allows you
to specify initialization expressions that are executed once at the
beginning of the loop, and loop-end expressions that are executed at the
end of each loop cycle. The form of the **for** statement is:

**for** (*expression1*; *expression2*; *expression3*)
*statement*;

Execution of a **for** statement proceeds as follows:

- Evaluate
*expression1*. Typically this expression will include
assignment operators ("=") to set initial values for loop variables. If
you need more than one initial expression, specify them as a list separated
by commas.
- Evaluate
*expression2*. If its value is false (0) terminate the
**for** statement and transfer control to the statement that follows the
controlled statement. If *expression2* is true, proceed to the next
step.
- Execute the controlled statement. If more than one statement is to be
controlled, enclose them with brace characters ("{" "}").
- Evaluate
*expression3*. This expression will typically contain
operators such as "++", "+=", "--", or "-=" to modify the value of a loop
variable.
- Transfer control to step 2, where
*expression2* is once again
evaluated.

### Stop Statement

The **stop** statement terminates the calculations for the current
iteration. The last value of the independent variable (as specified with a
**function** statement) is used as the calculated value of the function.
An implicit stop occurs if you "fall through" the last executable
statement. The form of the **stop** statement is:

stop;

### Dataskip Statement

**Dataskip** *count*; (optional) -- Specifies how many lines at the
front of the data file should be skipped when reading the data records.
Normally, NLREG assumes the first record of your data file contains the
first data observation for the analysis. However, some data files produced
by other programs may begin with one or more other types of lines (titles,
comments, etc.) before the first actual data observation.

### Data Statement

**Data** ["*file*"]; (required) -- Specifies the name of the file
containing the data records, or introduces the data records which follow
the statement. If a file name is specified on the **data** statement,
the file is opened, its data records are read, and the regression analysis
is performed. If a file name is specified without an extension, ".DAT" is
used by default. Note that if you specify a file name it must be enclosed
in quote marks.

If no file name is specified on the **data** statement, the data records
must immediately follow the **data** statement in the program file.

Each data record must contain at least as many data values as the number of
variables specified on the **variables** statement(s). The order of the
variables as specified on the **variables** statement must match the
order of the values in each observation. Any data values beyond those
required for the specified variables are ignored. Each observation must
begin on a new line.

Each set of data values (i.e., a record) is specified as a series of
numeric ASCII values terminated by a carriage-return/line-feed.
Alternatively, you can use a semicolon to terminate a data record rather
than carriage-return/line-feed. The data values within a record must be
separated by one or more spaces and/or a comma. You may place a comment on
the end of a data record by beginning the comment with "//". Data values
may contain decimal points and may be expressed in exponential notation
(i.e., *n.nnnn*E*ppp*).

The **data** statement must be the last statement in the program file.
If no file name is specified on the **data** statement, the data records
must immediately follow the **data** statement in the program file. The
following is an example of a complete program file including data records:

Variables age,miles,value;
Parameters base,depage,depmiles;
Function value = base + depage*age + depmiles*miles;
Data;
2 10000 13000
4 42000 9000
1 7000 17000
6 52000 6000
5 48000 8000

If the data records had been placed in a separate file named CAR.DAT, the
statements would be as follows:
Variables age,miles,value;
Parameters base,depage,depmiles;
Function value = base + depage*age + depmiles*miles;
Data "car.dat";