Pandas Read Excel – Reading Excel File in Python Pandas

  • Home
  • / Pandas Read Excel – Reading Excel File in Python Pandas

With the Pandas read Excel () function, you can simply read the Excel data into a DataFrame object. In the world of IT, an Excel sheet is considered one of the ever-presenting forms of file. Excel has widespread features used to manage multiple functions when it comes to storing and changing data in a systematic form.

Excel, on the other hand, gives options to use highly user-friendly and interactive sheets that help in accommodating smaller to larger datasets. With Python, you can use Excel to manage data in whatever way you want.

Here, in this post, you will get to know how Pandas read Excel spreadsheets. Apart from this, you will even get to know how you can find data from Excel in Pandas.

Key Points of Pandas Read Excel 

  • It helps in reading files with .xlsm, xlsx, xls, odf and odt formats.
  • It is useful to open up stored files in a local file system or even from an URL.
  • When it comes to URLs, the supported formats are ftp, s3, http, and file.
  • It also helps in reading from a single sheet or a list of sheets.

Pandas Read Excel Sheet 

By default, Pandas read_excel() function helps in reading Excel sheets data. Usually, it loads the first sheet from the excel file and counts the first row as a DataFrame column. .xlsx is an extension that is used for Excel files. Some of the other formats are also compatible with this function, such as xls, xlsb, odt, odf, xls, xlsx, and ods.

Below you will find some of the most supported features by Pandas read_excel():

  • It skips rows and columns.
  • Data types for all columns
  • It does not consider the name of columns and gives a chance to set column names.
  • It helps in setting the column as an index.
  • It reads excel files from s3, URL as well as from local file ad supports many extensions.
  • Decimal points are used for numbers.

Below you can see this function in the example of how to read an excel sheet.

import pandas as pd

# Read Excel file

df = pd.read_excel(‘c:/apps/courses_schedule.xlsx’)

print(df)

# Outputs

# Courses

Fee Duration Discount

#0 Spark

25000 50 Days 2000

#1 Pandas

20000 35 Days 1000

#2 Java

15000 NaN 800

#3 Python

15000 30 Days 500

#4 PHP

18000 30 Days 800

Copy

Understanding the Pandas read_excel Function

Basically, the Pandas read_excel() function consists of multiple parameters and here you will learn about how effectively you can use the basic parameters that help in reading Excel files in Pandas.

Parameter

explanation

Available Option

io=

This is the string lane to the workbook.

URL to file, path to file, etc.

 

sheet_name=

This is the name of the sheet to read. By default to the first sheet in the workbook (position 0)

Can read either strings (for the sheet name), integers (for position), or lists (for multiple sheets)

usecols=

The columns to read, if not all columns need to read

Can be strings of columns, Excel-style columns (“A:C”), or integers representing positions columns

dtype=

The datatypes to use for each column

Dictionary with columns as keys and data types as values

skiprows=

The number of rows to skip from the top

An integer value that shows the number of rows to skip

nrows=

The number of rows to parse

An integer value that shows the number of rows to read

Key Parameters of the Pandas .read_excel() Function 

If you have a look at the above table keenly, you will get to know some important parameters that can be used in the Pandas .read_excel() function.

How to State data Types in Pandas read_excel() Function

With Pandas, you can quickly identify the data type used in columns while reading the excel file. Mainly it presents three purposes:

  • Quickens the reading function
  • Stops data from being incorrectly read
  • Saves memory

Let’s have a look at how to identify the data types for the columns.

# Specifying Data Types for Columns When Reading Excel Files

import pandas as pd

df = pd.read_excel(

io=’https://github.com/datagy/mediumdata/raw/master/Sales.xlsx’,

dtype={‘date’:’datetime64′, ‘Customer’: ‘object’, ‘Sales’:’int’})

print(df.head())

# Returns:

# Customer Sales

# Date Customer Sales

# 0 2022-04-01 A 191

# 1 2022-04-02

B 727

# 2 2022-04-03

A 782

# 3 2022-04-04

B 561

# 4 2022-04-05

A 969

Remember that passing all the columns is not mandatory to function properly. Therefore, you need to be careful enough so that you can avoid facing many issues.

 

Write your comment Here