Next: More than one Media Up: Descriptions of Our New Previous: Introduction Contents Index

Overview of the Method

In this Section, we describe the overall steps to be followed in order to build a tool to automatically assess in real time the quality of real-time media transmitted over packet networks. We henceforth mean by the term ``media'' any speech, audio, video, or multimedia streams. Here our goal is to describe our method regardless of the media type in hand. See Figure 4.1 and Figure 4.2. First, we define a set of static information that will affect the general quality perception. We must choose the most effective quality-affecting parameters corresponding to the media-type applications and to the network that will support the transmission. Then, for each parameter we must select several values covering all the possible range for that parameter. More values should be given in the range of the most frequent occurrences. For example, if the percentage loss rate is expected to vary from 0 to 10 %, then we may use 0, 1, 2, 5, and 10 % as typical values for this parameter. This is provided that the loss rate is generally between 0 and 5%, and the highest allowed value is 10% in the range. (In this case, it is supposed that the quality is the worst for 10% loss and for normal values of the other parameters.) Note that, not all media types tolerate the parameters' values in the same way. For example, speech can tolerate up to 20% loss rate, while video may tolerate only up to 5 % of loss rate without error resiliency in the encoder side. More values should be used in the range where it is expected to be more frequent in reality. If we call configuration of the set of quality-affecting parameters, a set of values for each one, the total number of possible configurations is usually large. We must then select a part of this large cardinality set, which will be used as (part of) the input data of the NN in the learning phase. To generate a media database composed of samples corresponding to different configurations of the selected parameters (called ``Distorted Database''), a simulation environment or a testbed must be implemented. This is used to send media sequences from the source to the destination and to control the underlying packet network. Every configuration in the defined input data must be mapped into the system composed of the network, the source and the receiver. For example, working with IP networks and video streams, the source controls the bit rate, the frame rate and the encoding algorithm, and it sends RTP video packets; the routers' behavior contribute to the loss rate or the loss distribution, together with the traffic conditions in the network. The destination stores the transmitted video sequence and collects the corresponding values of the parameters. Then, by running the testbed or by using simulations, we produce and store a set of distorted sequences along with the corresponding values of the parameters. After completing the ``Distorted Database'', a subjective quality test must be carried out. There are several subjective quality methods in the recommendations of the ITU-R or ITU-T depending on the type of media in hands. Details on this step come in Section 4.3. In general, a group of human subjects is invited to evaluate the quality of the sequences (i.e. every subject gives each sequence a score from a predefined quality scale). The subjects should not establish any relation between the sequences and the corresponding parameters' values. The next step is to calculate the MOS values for all the sequences. Based on the scores given by the human subjects, screening and statistical analysis may be carried out to remove the grading of the individuals suspected to give unreliable results [73]. See Section 4.4 for more details about this step. After that, we store the MOS values and the corresponding parameters' values in a second database (which we call the ``Quality Database''). In the third step, a suitable NN architecture and a training algorithm should be selected. The Quality Database is divided into two parts: one to train the NN and the other one to test its accuracy. The trained NN will then be able to evaluate the quality measure for any given values of the parameters. More details about this part are given in Section 4.5. To put this more formally, we build a set ${\cal S} = \{\sigma_1,\sigma_2,\cdots,\sigma_S\}$ of media sequences that have encountered varied conditions when transmitted and that constitute the ``training part'' of the Quality Database. We also define a set ${\cal P} = \{\pi_1,\pi_2,\cdots,\pi_P\}$ of parameters such as the bit rate of the source, the packet loss rate in the network, etc. Then, we denote by $v_{ps}$ the value of parameter $\pi_p$ in sequence $\sigma_s$ , and by

the matrix $V = (v_{ps})$ . For $s = 1,2,\cdots,S$ , sequence $\sigma_s$ receives the MOS evaluation $\mu_s \in [N,M]$ from the subjective test phase. The goal of the NN is to find a real function

having

real variables and with values in

, such that

(i): for any sequence $\sigma_s \in \cal S$ , $f(v_{1s},\cdots,v_{P,s}) \approx \mu_s$ ,
(ii): and such that for any other vector of parameter values $(v_1,\cdots,v_P)$ , $f(v_1,\cdots,v_P)$ is close to the MOS that would receive any media sequence for which the selected parameters would have those specific values $v_1,\cdots,v_P$ .

Once all the above steps are completed successfully, we implement the final tool, which is composed of two modules: the first one collects the values of the selected quality-affecting parameters (based on the network state, the codec parameters, etc.). These values are fed into the second one, which is the trained NN that will take the given values of the quality-affecting parameters and correspondingly computes instantaneously the MOS quality score. All the above steps are summarized into the four parts shown in Figure 4.1. In the first part, we have to identify the quality-affecting parameters, their ranges and values, to choose the original sequences, and to produce the distorted database. In the second part, the subjective quality test is carried out for the distorted database and the MOS scores (together with their statistical analysis) are calculated in order to form the quality database. In the third part, we select the NN architecture and learning algorithm, and train and test it using the quality database. Finally, in the fourth step, we implement the final tool that consists of the two parts (parameters collection and MOS evaluation), in order to obtain a real-time quality evaluator. These steps are also shown in a block diagram in Figure 4.2.

Subsections

Next: More than one Media Up: Descriptions of Our New Previous: Introduction Contents Index

Samir Mohamed 2003-01-08