In out approach => In our approach (in 3.2.1)
processing modules.+ => processing modules. (in 3.2.2) 

Local decomposition control with classification complexity estimation could be applied
only to classification problem, because classification complexity methods could
be applied only in classification. => ?????


some part of problem require much more attention than other.
=> some parts of the problem require much more attention than othres. (in 3.2.3)


a data extraction scheme can be used. => a feature extraction scheme can be used. (in 4.2.2)

with use of preprocessing techniques like thinning and slope correction.
=>
with use of preprocessing techniques like thinning and slope correction [CL96].


in to two sub-spaces => into two sub-spaces
two meurons => two neurons
one Processing Units => one Processing Unit
few epocs => a few epocs (without "a" implies "no")
where o_n represent the system output... => do not indent (same holds all other
paragraph after equation. Insert, e.g. "\noindent"), and o_n should be O_n
step n . => step n.


in term of => in terms of


What does x-axis and y-axis represent in Fig 4.2? 

Bellow find my comment on 4.1.1

Since we are running out of time, it's not necessary 
for the comment bellow to be reflected or taken into account 
in modifying the real last version of the dissertation, but, instead, 
I hope you enjoy the disscussion about the topics.
-------------------
[1] To verify the decomposition, compare the results with the case of no decomposition.
You wrote probably the most important part in my humble opinion as 

"The reduction of number of data processed by one Processing Units while
keeping the examples similar inside one learning database makes processing easier
and faster: few epochs or recursion are needed to reach a 10^{-6} mean square error.

The difference between the system output and the T-DTS estimated output, in the
learning and generalization step are subsequently 1.3x10^{-4} and 2.8x10^{-4}. 
This is very low value comparing to the amplitudes of signals, consequently 1.2 and 0.31,
so the difference between predicted and real signals are hard to note, as seen in the
figure 4.4. That assures us that a faithful model was built, proving the efficiency of
T-DTS in this case."

Let me paraphrase the above to make it clear.

(1)
By both reducing the number of data and making them similar when we provide them
to one Processing Unit, decomposition enables the prosessing to be easier and faster, 
that is, only a few epocs or recursion are needed to reach a 10^{-6} mean square error.

(2)
The difference between the actual output and the output estimated by T-DTS
was 1.3x10^{-4} in the learning step and 2.8x10^{-4} in the generalization step.
These are very low, if we compare the difference to the amplitude of the signal, that is,
1.2 and 0.31, respectively. The difference is hard to notice, as can be seen in Figure 4.4. 
This suggests that the model built was faiseful. This proves the efficiency of T-DTS
in this case.

Assuming my paraphrase is collect, then my question is 
"Faster and easier than what?"  This is in the paragraph (1).

O.K. I admit the model built here is faiseful enough and therefore T-DTS is efficient.
What if, however, the original data are given to only one Processing Unit without 
being decomposed? What if the results are the same as the ones when we don't 
decompose the original data?

This is one of the most I'm interested in, and I found no such comparisons here.


[2] The experimental results should be reproducible
Sorry for my ignorance, but, assuming I know what is ARMAX(6.6) vector,
the process to obtain the conclusion is as follows, isn't it?
Firstly, the 294 inputs sampled from the equation (4.1.2) are given to the 1st DU. 
And the DU is "a competitive network of 2 neurons" right?
Then do you think I could construct these 3 DU's with only information given here?
Okay, let's  assume further that I could construct the 3 DU's, then how do I compare
the goodness my decomposition results and yours? In Figure 4.2 you didn't show
what do the x-axis and y-axis mean.

The same question arises also for the 4 PU's. Either 
LM learning or Levenberg-Marquadt learning is never described elsewhere
not even the reference. Of course this learning is the already established one.
But do you think any reader should know what is this?
IMHO, assumption should not be like "Everyone should know everything."

You might say LM-learning or Kohonen's SOM is popular enough not to need
to be refered to. But you show the reference "clustering" as
   "The purpose of Decomposition Unit is to divide the database into several 
    sub-databases. This task is referred in the literature as clustering [Har75]."

In addition, adding reference helps us to distinguish which are your proposal
and which are from others. For example, the description of "AVStds" is first appeared
in the section 3.2.2 withour being refered and it implies the concept of AVStd is not others
but your proposal. However, the fact that you sometimes don't give the refference 
to the concept from other literature make it difficult if the concept descripted is yours or not.
As a matter of fact, the idea of AVStd is very good, indeed. It's a shame to be unclear
if it's your proposal or not. Let's write, "We propose here..."

One more thing I might add is you write this AVStd for the first time in Chapter 4
at the section 4.2.1.  But you emploied this idea befor at the Section 4.1.1, right?
If so, the explanation made in 4.2.1 concerning AVStd as well as Competitive Network
should be made in 4.1.1.


[3] A dissertation should be self-contained.

For example, as for "Competitive Network"
[RZ85] but cited only once in Appendix A.5.4


[4]

Please allow me to aske a stupid question.
What if we decompose the original data intentionally at random?
With the number and size of the sub-data remaining the same as your experiment,
Wouldn't it improve the efficiency, time or whatever comparing to those
obtained by T-DTS?

When the above is not of the case, the last question would be as follows.

[5] The resultant decomposition is the optimum one?
This is what I'm most interested in.
If the resultant decomposition is not the optimum one,
even if it is better than tribial strategy like above,
GA


[4] and [5] would be a future work.