Introduction

Ever since I finished the students regrouping project I felt that there’s more I can do with it. At the same time I started to notice that as students get older, they tend to come to classes by themselves rather than being brought by parents, and I don’t have a chance to talk to them as I used to when they came after each class for feedback. I figured I could give them a rundown on students’ performance and send it directly via messengers. I tried doing that for a while a found that it takes quite a lot of time to put it all together even having data at hand, moreover I had to stay after hours to do it, which is obviously not ideal. So while I was thinking of ways to simplify it and make it faster, I started learning Python for data analysis just to see if it’s that much different from R and if I could find the use of it. While reading the material on things that Python can be used for I found that there is a thing called API that is utilized to make interaction between certain applications possible. And then it hit me, I had no doubt that I can make Python go to the database and pull necessary data and turn it to a kind of a report, and then use some messenger’s API to send it to a parent. This way I won’t have to write these messages manually, instead, I will only have to input data into the database and run code that creates reports and sends them.

<aside> <img src="/icons/code_lightgray.svg" alt="/icons/code_lightgray.svg" width="40px" /> All of the code below can be found in my Github repository.

</aside>

Database design

I’ve wanted to try another flavor of a DBMS for a while and starting a new project was a perfect opportunity to do so. On the podcasts and Youtube videos people have been talking about so called analytical databases such as BigQuery and ClickHouse. I decided I should choose one to build the project around. Since BigQuery is a Google’s darling and Google doesn’t work particularly well in the place I was at the time, I went with ClickHouse that was readily accessible. By then I already had my JupyterLab running on AWS EC2 t2.micro instance, that I found extremely reliable and convenient while being free, so I used my AWS account to deploy a ClickHouse database too. The database schema from the previous project turned out to be serving my needs quite well and with some minor adjustments, such as using numbers rather than letters for assessment, I adopted the same schema for the ClickHouse database.

https://dbdiagram.io/e/66175cbc03593b6b61b86a37/66177f4b03593b6b61ba6dc3

Data input

One of the main problems I encountered while doing my previous project was the data input, every time after the class I had to manually type the data in a following format:

INSERT INTO attendance (att_date,un_lsn,un_lsn_2,homework,student_id,comprehension,speaking,behaviour,vocabulary,reading,writing)
VALUES
('2022-11-16', 03.05, NULL, NULL, 2, 5,5,4,4,4,5),
('2022-11-16', 03.05, NULL, NULL, 1, 5,5,4,5,3,4),
('2022-11-16', 03.05, NULL, NULL, 3, 5,5,5,5,5,5);

This method was quite tedious and error prone because of its manual nature. If there is a skipped comma, the query will throw an error, if there is a quotation mark that is not closed, the query will throw an error, if there is a value that is missing, the query will throw an error. It is further complicated by the fact that in general these rows are difficult to read and check especially when there are about thirty of them. So although there was no way to automate the data input, I had an idea to make it less painful and unreliable, for that I used Python’s appendand inputmethods. First it was necessary to create a bunch of empty lists that will house each type of data:

dt=[] #will receive dates in a standard format YYYY-MM-DD
st_id=[] #student ids
bk_un_lsn=[] #book unit and lesson taught
bk_un_lsn_2=[] #used if the there is a double class
hwrk=[] #homework grade ranged from 1 to 5, where 5 is the highest
beh=[] #behaviour grade ranged from 1 to 5, where 5 is the highest
comp=[] #comprehension grade ranged from 1 to 5, where 5 is the highest
vocab=[] #vocabulary grade ranged from 1 to 5, where 5 is the highest
spk=[] #speaking grade ranged from 1 to 5, where 5 is the highest
rd=[] #reading grade ranged from 1 to 5, where 5 is the highest
wrt=[] #writing grade ranged from 1 to 5, where 5 is the highest

Afterwards append and input are used to fill each list:

dt.append(input('Enter the date:'))
st_id.append(int(input('Enter student id:')))
bk_un_lsn.append(input('Enter book, unit and lesson:'))
bk_un_lsn_2.append(input('Enter book, unit and lesson:')or None)
hwrk.append(input('Enter homework grade:')or None)
beh.append(input('Enter behaviour grade:')or None)
comp.append(input('Enter comprehension grade:')or None)
vocab.append(input('Enter vocabulary grade:')or None)
spk.append(input('Enter speaking grade:')or None)
rd.append(input('Enter reading grade:')or None)
wrt.append(input('Enter writing grade:')or None)

Using or None simplifies the data input, if the input field is left blank the list will be filled with None, which is ideal for indicating that there is no data available. This is what inserting data looks now:

Untitled

After defining the dataframe structure with a dictionary, the lists can be turned to a Pandas dataframe, which should be inserted into the ClickHouse database, but before the ‘date’ column has to be set to datetime type so that ClickHouse parses the column as date too.

df_structure = {
'date': dt,
'student_id': st_id,
'book_unit_lesson': bk_un_lsn,
'book_unit_lesson_2' : bk_un_lsn_2,
'homework' : hwrk,
'behaviour' : beh,
'comprehension' : comp,
'vocabulary' : vocab,
'speaking' : spk,
'reading' : rd,
'writing' : wrt
}

df = pd.DataFrame(df_structure)
df['date'] = pd.to_datetime(df['date'])
date student_id book_unit_lesson book_unit_lesson_2 homework behaviour comprehension vocabulary speaking reading writing
0 2024-04-10 2 HC1U7L7 HC1U7L8 5 4 4 5 5 None 2
1 2024-04-10 5 HC1U7L7 HC1U7L8 1 3 4 5 5 None 2
2 2024-04-10 8 HC1U7L7 HC1U7L8 1 3 5 5 4 None 1
3 2024-04-10 3 HC1U7L7 HC1U7L8 5 5 4 5 4 None 3

Interaction between Python and ClickHouse