Panel Data, also known as Longitudinal data, is a combination of both Cross-Section and Time Series Data. It is usually called Cross-sectional Time-series data as it is a collection of observations for multiple subjects at multiple instances. Panel data series can be any of the following four types.

Long Panel Data

Wide Panel Data

Balanced Panel Data

Unbalanced Panel Data

Long Panel Data

Most regressions require data to be in long format. This means there is a row for each entity (e.g., person) at each time point.

If we conduct a 3-decade GDP panel survey of 50 states,each of whom responded to all 3 decades, the long format of these data would have150 rows (50 states x 3 decades).

Wide Panel Data

Wide data, on the other hand, have only one row per entity and a separate column for each measure and time point.

If we conduct a 3-decade GDP panel, an Unemployment – Inflation survey of 50 states,each of whom responded to all 3 decades, against 3 measures.

Balanced Panel Data

Our datasets have the same number of observations for all columns of data, then it’s a balanced panel set.

Our datasets have the same number of observations for all columns of data, then it’s a balanced panel set.

Unbalanced Panel Data

Unbalanced panel datasets means data not available for some time observations for some of the column variables.

This means our datasets doesn’t have all the information that we need i.e., some fields of data missing or not available.

Examples of Panel Data

Why do we use Panel data?

Panel data incorporates both common and individual variables

Provides two dimensional picture of a dataset: Time and Space

Contains more information,variability and efficiency than a single Time Series and Cross-sectional data

Eliminates biases that can arise if we use only time series or cross-sectional data

Widely available, comprehensive as it focuses on inter-individual and/or intra-individual differences over time

Captures complex human behavior than a single cross-section or time series

Heterogeneity across cross-sectional units and over time that are not captured by the observed variables can be captured by period-invariant individual specific and/or individual-invariant time specific effects

What does Panel Data look like or how to recognise Panel Data?

When you have a data set that has values of a variable over time and space, you have met panel data. If we have only the GDP of different states of India in a particular year, then it becomes a cross section data.Similarly, if we have data of GDP of a state over different time periods for e.g. 1960 – 2019, then it’s a time series.When we have data for all the states over the period 1960-2020, then it’s panel data.

Methods to deal with it

Some Panel models can use only balanced panel data and unbalanced panel datasets can be condensed till the period for which all variables have complete information.

Problems with Panel Data

Data Management: Panel Data needs to be arranged properly before fitting the model

Heterogeneity: Cross-sectional-Time Series data spreads across space and time. In such cases it is inevitable that heterogeneity in data.

If we look at the growth of GDP of Asian countries over the period 1950 to 2019:

Some of the countries achieved independence in 50s.

Some of them went through drastic change in economic policy

Many countries were affected by terrorism, and so on

So, heteroskedasticity and multicolinearity might be inherent in this model.

How to model Panel Data?

There are two ways to estimate the Panel Data regression methods

Fixed Effect Approach

Suppose, there are 4 companies and we have to understand their investment function for 20 years as

i=1,2,3,4

t= 1,2,3,…20

The model will be Yit = β1 + β2X2it + β3X3it + µit

Under the Fixed Effect Approach there can be 5 case studies.

Case -1

This is a pooled regression case, where 20 years observation for each company is studied. This does not give any information on the specific nature of each company.All coefficients remain constant over time and space.

Case -2

This is an improvement over case 1 since it studies the individuality of each company.The slope coefficients remain constant while intercept varies for each individual.It is also known as the Least Square Dummy Variable Model, since we use dummy in the intercept.

Case - 3

Here, the intercepts varies over individuals and time as well while slope coefficients remain constant.

Case - 4

All coefficients vary across individuals but not over time.

Case - 5

All coefficients vary across individual and time. We may include interactive as well as adititve dummies to find differential slope coefficients and differential intercepts.

Random Effect Model or Error Components Model

Here, it is assumed that the intercept of an individual unit is a random variable, drawn from a large population, with a constant mean value. The intercept is a combination of its estimate and error component.

How do we decide between the two methods of dealing with Panel Data ?

Hausman test having a chi square distribution is used to choose the appropriate model.