Pandas Upload Dataframe to Mssql Temp Table

In this article, I am going to demonstrate how to connect to databases using a pandas dataframe object. Pandas in Python uses a module known as SQLAlchemy to connect to various databases and perform database operations. In the previous article in this series "Larn Pandas in Python", I have explained how to get up and running with the dataframe object in pandas. Using the dataframe object, you can easily outset working with your structured datasets in a like way that of relational tables. I would suggest yous have a look at that article in case you are new to pandas and want to learn more almost the dataframe object.

A brief about SQLAlchemy

To talk nearly the SQLAlchemy in brief, information technology tin can be referred to as an ORM (Object Relationship Mapping), as well, which is written in Python to work with databases. Information technology helps programmers and application developers take full command flexibility over the SQL tools. Often, while developing applications in any programming linguistic communication, nosotros come across the need to shop and read data from the databases. This module provides a pythonic way to create and represent relational databases from within the Python projects. An advantage of working with such a module is that you practice non need to retrieve the syntactical differences of the various databases around. The module does all the heavy lifting for you, while y'all interact with all the databases in the aforementioned mode.

You lot can read more about the module from the official website.

Official website

Figure 1 – Official website

Installing the module in Python

You can download the latest version of the module from the official website by navigating to https://world wide web.sqlalchemy.org/download.html#electric current. As well, for the purpose of this tutorial, I am going to create a virtual environment and practice all the necessary demo from inside the environment. Y'all tin can run the following command to install information technology in your environment in Python.

Installing the module in a virtual environment

Figure 2 – Installing the module in a virtual environment

Once the module has been imported, let us now import it and see if our script is running fine.

Testing the script by importing the module

Figure 3 – Testing the script by importing the module

As you can see in the figure to a higher place, the modules have been imported successfully. So now, we can begin working with the SQL Abracadabra module. In this tutorial, I am going to utilize PostgreSQL as the database. Still, you can use any other database of your choice if you like to. This module supports multiple databases like MySQL, SQL Server, SQLite, etc.

Creating the connexion engine

In gild to be able to connect to the databases, we need to initiate something known every bit Connection Engine. This engine is dependent on the type of database that you are connecting to. Information technology will be used to connect to the database engine when the script is executed. Y'all can create the engine by using the following URI blueprint.

'postgresql://username:countersign@databasehost:port/databasename'

A URI mentioned above is a simple connection string that can exist used past the module to establish a connection with the PostgreSQL database. In the first function, you need to mention the database flavour that you are connecting to. It can exist "mysql" or "mssql" depending on the database that you use. In this case, it is going to be "postgresql". In the second part of the connection cord, you need to specify the username and countersign that you will be using to connect to the database server. Note that the username and the countersign are separated by using colons. In the tertiary part, you demand to mention the database hostname and the port on which the database is running, followed by the database proper noun. So, the terminal command to create the engine using such a connection string volition exist as follows.

Additionally, if you are connecting to a MySQL database, y'all demand to install the "pymysql" package. However, for PostgreSQL, I am going to install the "psycopg2" which volition enable united states to connect to the database engine. Yous can install the package past running the following command from the terminal.

That is all about creating a database connection. Now, we tin proceed to apply this connectedness and create the tables in the database.

Create a SQL table from Pandas dataframe

Now that we have our database engine ready, allow us first create a dataframe from a CSV file and effort to insert the same into a SQL table in the PostgreSQL database. I am using a Superstore dataset for this tutorial, which you can download from https://data.world/annjackson/2019-superstore. To read data from a CSV file in pandas, you can use the post-obit command and store information technology into a dataframe.

Now, the data is stored in a dataframe which can be used to exercise all the operations. In club to write data to a tabular array in the PostgreSQL database, we need to use the "to_sql()" method of the dataframe class. This method will read information from the dataframe and create a new table and insert all the records in it. Let us see this in action at present. You can utilize the following code for your reference.

When yous run the file, a table with the proper noun "superstore" will be created, and you can select the information from the tabular array accordingly.

Selecting data from table

Figure 4 – Selecting data from the table

Equally you can run into in the above code, the "to_sql()" method takes ii arguments as the name of the table to be created and the engine to connect to. These two are the mandatory parameters of this method. However, in that location are likewise a few optional parameters to this method, which I would like to discuss.

  • if_exists – This parameter is used to decide what should be washed in case the table already exists in the database. By default, pandas will not exist able to write data into this table and will somewhen throw an error. Y'all tin can customize it by providing a value of "REPLACE" if you would similar to drop and create a new table every fourth dimension the code is executed. Also, you can pass a value of "Suspend" if you want to add new records into the tabular array on each execution
  • schema – By default, pandas will write data into the default schema for the database. In PostgreSQL, it is the "public" schema, whereas, in SQL Server, information technology is the "dbo" schema. If you desire it to create a tabular array in a different schema, you can add together the name of the schema every bit value to this parameter
  • index – This is a Boolean field which adds an Alphabetize column to the tabular array to uniquely identify each row when the value is set to True
  • chunksize – This can be referred to as a batch of data existence inserted to the table instead of 1 row at a time. You can specify an integer value, and that will be the size of the batch that volition be used to insert the data. This feature is useful if y'all take a really big dataset, and you want to majority insert data
  • dtype – This is a dictionary that accepts the cavalcade names and their datatypes if we demand to explicitly declare the datatypes of the fields that are in the dataframe. The cardinal in the lexicon is the proper noun of the column, and the value is the datatype. This is recommended if yous want greater control over declaring the datatypes of your table and do not want to rely upon the module to do information technology for you

This is all virtually the "to_sql()" method from the SQLAlchemy module, which can exist used to insert data into a database tabular array.

Determination

In this article, I have explained in detail about the SQLAlchemy module that is used by pandas in order to read and write data from various databases. This module can be installed when you install pandas on your motorcar. All the same, y'all need to explicitly import information technology in your programs if you want to use it. The SQLAlchemy modules provide a wrapper around the bones modules for almost of the popular databases. Basically, it makes working with the databases a lot easier when used in a combination of pandas. In my next article in the series, I will explain how to read information from a database using the SQLAlchemy module and will also explicate how to execute SQL queries directly.

  • Author
  • Recent Posts

Aveek Das

blockthessight.blogspot.com

Source: https://www.sqlshack.com/introduction-to-sqlalchemy-in-pandas-dataframe/

0 Response to "Pandas Upload Dataframe to Mssql Temp Table"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel