importpandasaspdfromydata_profiling.utils.cacheimportcache_filefromydata_profilingimportProfileReportfile_name=cache_file("pollution_us_2000_2016.csv","https://query.data.world/s/mz5ot3l4zrgvldncfgxu34nda45kvb",)df=pd.read_csv(file_name,index_col=[0])# Filtering time-series to profile a single sitesite=df[df["Site Num"]==3003]#Enable tsmode to True to automatically identify time-series variables#Provide the column name that provides the chronological order of your time-seriesprofile=ProfileReport(df,tsmode=True,sortby="Date Local",title="Time-Series EDA")profile.to_file("report_timeseries.html")
要启用时间序列报告生成,需要将 ts_mode 设置为 True。如果设置为 True,则将根据自相关性自动识别具有时间依赖性的变量。时间序列报告使用 sortby 属性对数据集进行排序。如果未提供,则假定数据集已排序。您可以通过设置 x 配置来设置要检测的相关级别,以应用时间序列验证。
importpandasaspdfromydata_profiling.utils.cacheimportcache_filefromydata_profilingimportProfileReportfile_name=cache_file("pollution_us_2000_2016.csv","https://query.data.world/s/mz5ot3l4zrgvldncfgxu34nda45kvb",)df=pd.read_csv(file_name,index_col=[0])# Filtering time-series to profile a single sitesite=df[df["Site Num"]==3003]# Setting what variables are time seriestype_schema={"NO2 Mean":"timeseries","NO2 1st Max Value":"timeseries","NO2 1st Max Hour":"timeseries","NO2 AQI":"timeseries",}profile=ProfileReport(df,tsmode=True,type_schema=type_schema,sortby="Date Local",title="Time-Series EDA for site 3003",)profile.to_file("report_timeseries.html")