SJSU CS157A DBMS1 Dr. Lin 2018-08-28T

Go to the DB1 project page: http://xanadu.cs.sjsu.edu/~drtylin/classes/cs157A/Project/DB1/
Download and run the create_FIRST1a.rtf file http://xanadu.cs.sjsu.edu/~drtylin/classes/cs157A/Project/DB1/create_FIRST1a.rtf on your DB.
Then download and run “populate_FIRST1a .rtf” http://xanadu.cs.sjsu.edu/~drtylin/classes/cs157A/Project/DB1/populate_FIRST1a%20.rtf, the one that has a space in the name, because the one without a space in the name has “FIRST1a_ID_SEQ.nextval” in the queries, which the table creation file “create_FIRST1a.rtf” doesn’t create.

Then execute these commands in turn and review the results:

select sum(salary) from first1a;
-- remember, this poor guy got cancer from the company's parts he was handling
delete from first1a where snum="s3" and pnum="p2";
select sum(salary) from first1a;

Dr. Lin said that this employee wanted time off but also should still be paid. I think the point he was trying to show here is that by removing him from the list of employees who are available to continue working in the company, because of the poor schema design, that he was also removed from sum(salary), so now we don’t know how much the company is paying its employees because the database design can’t properly represent that.

However, I believe Dr. Lin introduced a mistake when he used an aggregate function “sum(salary)”, because each employee’s salary is getting counted each time for the number of parts that employee handles. What we really want to do is see each employee’s salary once, and then sum up the results.

See each employee’s salary once:

select distinct snum,salary
    from first1a;
+------+--------+
| snum | salary |
+------+--------+
| S1   |  40000 |
| S2   |  30000 |
| S3   |  30000 |
| S4   |  40000 |
| S7   |  60000 |
+------+--------+

Then, from that table, select sum(salary):

select sum(salary)
    from (
        -- note that this is the query from above
        select distinct snum,salary
            from first1a
    ) table_alias_1;
+-------------+
| sum(salary) |
+-------------+
|      200000 |
+-------------+

What this shows is that you can not only select from tables in the database, but you can also select from the results of a select statement, which is called a subquery. Note that “table_alias_1” is just a necessary part of the MariaDB syntax – without naming it that, an error would appear: “ERROR 1248 (42000): Every derived table must have its own alias”.

Note that the “) table_alias_1” could have just as easily read “) AS table_alias_1”. “AS” is optional. However, table_alias_1 is not a very good name for the subquery. Sometimes it’s difficult to think of a good name for a subquery so we end up naming them things like “step1”, “step2”, etc., but in this case, an apt name could be “employee_salaries” or “distinct_employee_salaries”.

Connecting to MariaDB (especially in Python)

The Password File

First, set up your .my.cnf file: https://dev.mysql.com/doc/refman/8.0/en/option-files.html
Here is an example:

[client]
host=localhost
port=3306
user=mica
password='your user db password here'
database=db1

This file allows you to connect to the DB more easily. With it, all you have to type (assuming your db is on the same machine you’re running commands from): “mysql db1”. Without it, you would have to type: “mysql -u mica db1 -p” and then type your password at the prompt every time you want to connect. (Note that you should never type your password in plaintext on the command line using “-p=mypassword” or “–password=mypassword”, because other processes on the computer can see what you typed in and steal your password. Use the .my.cnf file or use the “-p|–password” without an argument and type it in at the prompt where your password is hidden from view.)

Connecting to MariaDB in Python

Original webpage: https://mariadb.com/resources/blog/how-connect-python-programs-mariadb
MySQL-python client: https://pypi.org/project/MySQL-python/

If you have anaconda (see my post Jupyter Notebook Introduction and Basic Installation for installing anaconda):

# this creates an environment called "mariadb" that installs a library called "mysqlclient" using the channel "bioconda".
conda create -n mariadb -c bioconda mysqlclient

Otherwise, create a virtualenv by following this tutorial: https://packaging.python.org/guides/installing-using-pip-and-virtualenv/

# note that "~" is synonymous with your home folder, e.g. /Users/mica in mac or /home/mica in linux.
# on windows replace "~" with "YOUR_HOME_FOLDER", wherever your home folder is.

# create the environment.
virtualenv ~/python3

# activate the environment
source ~/python3/bin/activate

# confirm that the environment is active.
which pip
# should return something like ~/python3/bin/pip

# follow the pip command as mentioned by the link mentioned above:
pip install MySQL-python

Finally, some python code to confirm that the installation is working:

Additional Notes

This demonstrates proper use of query parameters. It properly formats the data types for the query (e.g. dates/integers/strings/etc.), and is also the proper use in production code for preventing injection attacks:

cursor.execute(
    'select * from companies where SNUM=%(snum)s',
    params={
        'snum': 'S1',
    },
)