Skip to content

add hw on pandas#12

Open
ipsemenov wants to merge 2 commits into
mainfrom
pandas
Open

add hw on pandas#12
ipsemenov wants to merge 2 commits into
mainfrom
pandas

Conversation

@ipsemenov
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown

@krglkvrmn krglkvrmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В целом норм. Из замечаний - не сделан файл со списом зависимостей и маловато работы с таблицами в EDA

Comment thread hw9/code/intro_pandas.py


# read dataframe
df = pd.read_csv('../data/train.csv')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Пути к файлам лучше всегда указывать через os.path.join

Comment thread hw9/code/intro_pandas.py
Comment on lines +13 to +23
fig, ax = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Distributions nucleotides over positions in reads', fontsize=18)
ax[0, 0].hist(df['A'].dropna(), bins=30)
ax[0, 1].hist(df['C'].dropna(), bins=30)
ax[1, 0].hist(df['T'].dropna(), bins=30)
ax[1, 1].hist(df['G'].dropna(), bins=30)

ax[0, 0].set_title('A', fontsize=16)
ax[0, 1].set_title('C', fontsize=16)
ax[1, 0].set_title('T', fontsize=16)
ax[1, 1].set_title('G', fontsize=16)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Было бы удобно отобразить это на одном графике. Плохой вариант - наложенные гистограммы с прозрачностью. Более хороший - barplot с группировкой или stacked barplot

Comment thread hw9/code/intro_pandas.py
Comment on lines +113 to +119
rrna_annotation_df = pd.read_csv(path_to_file, header=1, sep='\t')
rrna_annotation_df = pd.DataFrame(np.r_[np.array(rrna_annotation_df.columns)[np.newaxis, :],
rrna_annotation_df.values])
rrna_annotation_df.columns = ['chromosome', 'source', 'type', 'start', 'end',
'score', 'strand', 'phase', 'attributes']
rrna_annotation_df[['start', 'end']] = rrna_annotation_df[['start', 'end']].astype(int)
rrna_annotation_df['score'] = rrna_annotation_df['score'].astype(float)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Я не совсем понял к чему весь этот огород с индексами. Чтобы убрать комменты?

Suggested change
rrna_annotation_df = pd.read_csv(path_to_file, header=1, sep='\t')
rrna_annotation_df = pd.DataFrame(np.r_[np.array(rrna_annotation_df.columns)[np.newaxis, :],
rrna_annotation_df.values])
rrna_annotation_df.columns = ['chromosome', 'source', 'type', 'start', 'end',
'score', 'strand', 'phase', 'attributes']
rrna_annotation_df[['start', 'end']] = rrna_annotation_df[['start', 'end']].astype(int)
rrna_annotation_df['score'] = rrna_annotation_df['score'].astype(float)
rrna_annotation_df = pd.read_table(path_to_file, header=None, names=['chromosome', 'source', 'type', 'start', 'end', 'score', 'strand', 'phase', 'attributes'], comment="#")
rrna_annotation_df[['start', 'end']] = rrna_annotation_df[['start', 'end']].astype(int)
rrna_annotation_df['score'] = rrna_annotation_df['score'].astype(float)

Comment thread hw9/code/intro_pandas.py
Comment on lines +135 to +136
rna_stats_df = rrna_annotation_df.groupby(
['chromosome', 'rna_type'], as_index=False).size().reset_index(name='counts')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Мб дело в версии пандаса, но я проверить не мог, так как список зависимостей не указан)

Comment thread hw9/code/intro_pandas.py
Comment on lines +147 to +148
is_inside = df_intersected[['start_x', 'end_x', 'start_y', 'end_y']].apply(
lambda x: x[0] > x[2] and x[1] < x[3], axis=1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Можно было просто сделать сравнения между колонками, но так тоже можно

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants