Next: Possible Extensions Up: General Conclusions of this Previous: General Conclusions of this Contents Index

Summary of the Main Contributions

The work presented in this dissertation can be summarized as follows. We have proposed a methodology to evaluate automatically the quality of real-time multimedia streams¹⁰², taking into account the fact that there are many factors affecting the quality, including encoding impairments and the distortion due to the transmission over packet networks. The goal of this method is to overcome the limitations of the available quality measuring techniques in the literature. The advantages of our method are: (i) the results obtained correlate well with human perception as our procedure is partially built using subjective data; (ii) it is not computationally intensive, as a simple neural network (NN) is used to measure the quality; (iii) there is no need to access the original multimedia signals (before encoding and transmission), which serve only during the development phase to identify some objective factors that are used to allow the NN to learn the relation between the original and processed signals based on these objective factors; (iv) many factors that cannot be taken into account by the traditional methods can be easily included (some examples, among many others, are the frame rate and the use of FEC). We used our method to evaluate real-time speech quality by taking into account the following parameters: packet loss rate, loss distortion, packetization interval, several encoding speech codecs and the spoken language effect. We showed that, once trained, the NN can evaluate accurately the quality of other cases (not considered during the training phase). We have shown that the NN can learn quite well the way a group of human subjects, participating in subjective quality tests, evaluates speech quality. In this way, the trained NN reacts similarly as human subjects for other samples. The results obtained correlate very well with those obtained by subjective tests. We followed the same approach to evaluate real-time video quality. The considered parameters are: packet loss rate, loss distribution, bit rate, frame rate, and encoding type for the H.263 codec. The application of our technique to video streams led to similar results as in the case of speech. The aforementioned characteristics of our methodology make it possible to use it in real time without significant processing overhead. The fact that the quality can be measured, with great confidence, without having access to the original signals, makes it possible to develop new communication protocols, which could make use of the automatic evaluation of the end-user's perception of multimedia quality as a function of objective measurable parameters like the loss rate, the frame rate, etc. As an example of such protocols, we designed a new rate control mechanism. This mechanism combines an automated real-time multimedia quality assessment with a TCP-Friendly rate controller. It helps in delivering the best multimedia quality and in saving superfluous bandwidth for a given network situation (by determining the exact sending rate to be used, instead of just giving an upper bound). The controller decides, based on the quality (measured by the trained neural network) and on the network conditions (TCP-Friendly rate controller suggestions), which parameters should be modified and how to achieve this task. Before implementing such a protocol, it is necessary to understand the impact of the parameters on multimedia quality. The requirements are that the quality must correlate well with human perception and that the analysis must include the combined effect of all the considered quality-affecting parameters at a time. There are two possible solutions to satisfy them: either using subjective quality tests or using a good objective measure that takes into account the direct effect of a wide range variation of these parameters. Unfortunately, the first solution is not practical as it is very expensive to carry out and highly time consuming. To reach a minimum precision level, a huge number of evaluations is necessary (for example, in the video case, to consider 5 parameters, about 2000 evaluations need to be done). Regarding the second solution, there is no previously objective measure that satisfies both requirements. Our method can be used to solve this two-part problem. Thus, always for video problem, to study the impact of the 5 selected parameters with our tool, we obtained good results using 80 subjective evaluations and another 14 to cross-validate the NN. The trained NN allows us to evaluate the quality for wide range variations of the considered parameters. We followed this approach to study and analyze the quality for speech and video. The considered parameters are those previously mentioned for both cases. In our work, we used two types of NN, namely Artificial NN (ANN) and Random NN (RNN). We strongly recommend the use of RNN as they offer some advantages in these applications. This claim is based on the numerous experiments we did in order to compare the performance of these tools. Among these advantages we can mention the following: (i) RNN can capture more precisely the nonlinear relation between the quality and the affecting parameters; (ii) RNN do not suffer too much from the problem of overtraining; (iii) in the run-time phase, RNN are much faster than ANN for a given architecture. However, there is one drawback of RNN, which is the available training algorithm (gradient descent), which can be quite slow to converge to the solution or to a minimal precision. For example, in the case of video, to reach a mean square error of 0.0025, a typical learning phase takes about 5000 iterations in 58538.5 sec. (about 16 hours in a standard workstation). This problem motivated us to explore new learning techniques for RNN. This led us to propose two new training algorithms. The first one is based on the Levenberg-Marquardt method for ANN. This algorithm is one of the most powerful for ANN in terms of performance (time required to converge, precision, number of iterations, etc.). The second one is an improvement over the general Levenberg-Marquardt method for the case of feedforward networks. It introduces the adaptive momentum to further accelerate and enhance the training process by adding the information of the second order derivatives in the cost function. We have studied these two algorithms and we have provided the steps and mathematical derivations necessary to use them with RNN. We have shown that these two algorithms outperform the available gradient descent one. For the video problem discussed in this document and the same architecture, the second proposed algorithm reaches the error goal in only 7 iterations and takes only 47.37 sec. We have evaluated their performances for other problems and also provided some variants of both techniques. We have also presented a new model for traffic prediction that makes use of both the long-range and short-range periodicity of the traffic process to provide more accurate results. The existing models concentrate only on the short-range information while neglecting the long-range information, which is very important to characterize the traffic process more precisely.

Next: Possible Extensions Up: General Conclusions of this Previous: General Conclusions of this Contents Index

Samir Mohamed 2003-01-08