Human similarity len lines of fit ---------------------------------------
Correlation between desc len and rouge1_p: PearsonRResult(statistic=-0.6602644633231723, pvalue=0.0)
Correlation between desc len and rouge1_r: PearsonRResult(statistic=0.39656910466793505, pvalue=2.1583441443783216e-165)
Correlation between desc len and rouge1_f1: PearsonRResult(statistic=-0.5226627431027959, pvalue=1.3478405871607433e-306)
Correlation between desc len and rouge2_p: PearsonRResult(statistic=-0.6881062570995833, pvalue=0.0)
Correlation between desc len and rouge2_r: PearsonRResult(statistic=0.03552440801869153, pvalue=0.018541627242542887)
Correlation between desc len and rouge2_f1: PearsonRResult(statistic=-0.534016558329884, pvalue=1.8e-322)
Correlation between desc len and rougeL_p: PearsonRResult(statistic=-0.7130005463919122, pvalue=0.0)
Correlation between desc len and rougeL_r: PearsonRResult(statistic=0.24109303417191902, pvalue=3.9648195401793274e-59)
Correlation between desc len and rougeL_f1: PearsonRResult(statistic=-0.535470389251286, pvalue=0.0)
Correlation between desc len and rougeLsum_p: PearsonRResult(statistic=-0.705441706044769, pvalue=0.0)
Correlation between desc len and rougeLsum_r: PearsonRResult(statistic=0.2652578254888701, pvalue=1.2276053446779567e-71)
Correlation between desc len and rougeLsum_f1: PearsonRResult(statistic=-0.5225498990926154, pvalue=1.9250794058526996e-306)
Correlation between desc len and bleu: PearsonRResult(statistic=-0.23287141541112416, pvalue=3.547000167389659e-55)
Correlation between desc len and chrf: PearsonRResult(statistic=0.05187430336617819, pvalue=0.0005826797256125494)
Correlation between desc len and bleurt: PearsonRResult(statistic=-0.043188877210770305, pvalue=0.004195611690166321)
Correlation between desc len and bertscore_p: PearsonRResult(statistic=-0.4276676661902241, pvalue=7.075219450209847e-195)
Correlation between desc len and bertscore_r: PearsonRResult(statistic=-0.011449864370309686, pvalue=0.44803048432949394)
Correlation between desc len and bertscore_f1: PearsonRResult(statistic=-0.2979819502949841, pvalue=8.798498492723582e-91)
BLIP len lines of fit ---------------------------------------
Correlation between desc len and probs: PearsonRResult(statistic=-0.0019565402063631433, pvalue=0.8769141286108553)
Correlation between desc len and cos_sims: PearsonRResult(statistic=-0.021037680095395923, pvalue=0.0958024981723081)
correlation between blipscores and human sims --------------------------
Correlation between blip-probs and rouge1_p: PearsonRResult(statistic=0.04923347749986574, pvalue=0.0010976144295786329)
Correlation between blip-probs and rouge1_r: PearsonRResult(statistic=0.07329813887714574, pvalue=1.1542163934383586e-06)
Correlation between blip-probs and rouge1_f1: PearsonRResult(statistic=0.08087838959930257, pvalue=7.967375182096399e-08)
Correlation between blip-probs and rouge2_p: PearsonRResult(statistic=0.060270685256583446, pvalue=6.409054409384896e-05)
Correlation between blip-probs and rouge2_r: PearsonRResult(statistic=0.07616080590341065, pvalue=4.330579082377632e-07)
Correlation between blip-probs and rouge2_f1: PearsonRResult(statistic=0.07927553628710321, pvalue=1.4316052102814765e-07)
Correlation between blip-probs and rougeL_p: PearsonRResult(statistic=0.05909672697260255, pvalue=8.88342588369314e-05)
Correlation between blip-probs and rougeL_r: PearsonRResult(statistic=0.06688160969608874, pvalue=9.139081415599741e-06)
Correlation between blip-probs and rougeL_f1: PearsonRResult(statistic=0.08781340040327923, pvalue=5.547562883203563e-09)
Correlation between blip-probs and rougeLsum_p: PearsonRResult(statistic=0.05829496874068637, pvalue=0.00011065224128377396)
Correlation between blip-probs and rougeLsum_r: PearsonRResult(statistic=0.06619181105040704, pvalue=1.1296495073035874e-05)
Correlation between blip-probs and rougeLsum_f1: PearsonRResult(statistic=0.08761727735897788, pvalue=5.999058059856348e-09)
Correlation between blip-probs and bleu: PearsonRResult(statistic=0.06531296745931268, pvalue=1.4754717326682747e-05)
Correlation between blip-probs and chrf: PearsonRResult(statistic=0.09054840975923402, pvalue=1.8305874968860283e-09)
Correlation between blip-probs and bleurt: PearsonRResult(statistic=0.10354471809764468, pvalue=6.008038817631731e-12)
Correlation between blip-probs and bertscore_p: PearsonRResult(statistic=0.074275751183701, pvalue=8.291308080738817e-07)
Correlation between blip-probs and bertscore_r: PearsonRResult(statistic=0.09424051774734123, pvalue=3.88975328663038e-10)
Correlation between blip-probs and bertscore_f1: PearsonRResult(statistic=0.10901849894911886, pvalue=4.3160632940413376e-13)
Correlation between blip-sims and rouge1_p: PearsonRResult(statistic=-0.0006716185104100117, pvalue=0.9645042439923877)
Correlation between blip-sims and rouge1_r: PearsonRResult(statistic=0.03726044471584945, pvalue=0.013520136076733564)
Correlation between blip-sims and rouge1_f1: PearsonRResult(statistic=0.011142656204038145, pvalue=0.46030496574342883)
Correlation between blip-sims and rouge2_p: PearsonRResult(statistic=-0.013859265100263841, pvalue=0.3584247898740601)
Correlation between blip-sims and rouge2_r: PearsonRResult(statistic=0.019256794020779587, pvalue=0.20192376793861117)
Correlation between blip-sims and rouge2_f1: PearsonRResult(statistic=-0.0058252708824506755, pvalue=0.6995035820892026)
Correlation between blip-sims and rougeL_p: PearsonRResult(statistic=-0.0020798590490085367, pvalue=0.8903879141802504)
Correlation between blip-sims and rougeL_r: PearsonRResult(statistic=0.03673438353328133, pvalue=0.014897010209634881)
Correlation between blip-sims and rougeL_f1: PearsonRResult(statistic=0.012731578419873157, pvalue=0.3988709534633897)
Correlation between blip-sims and rougeLsum_p: PearsonRResult(statistic=-0.014656684887341155, pvalue=0.3314404600098334)
Correlation between blip-sims and rougeLsum_r: PearsonRResult(statistic=0.0166681873250025, pvalue=0.2693654832144231)
Correlation between blip-sims and rougeLsum_f1: PearsonRResult(statistic=-0.00942451926958595, pvalue=0.5323055007361664)
Correlation between blip-sims and bleu: PearsonRResult(statistic=-0.032506072825112836, pvalue=0.03120495616550004)
Correlation between blip-sims and chrf: PearsonRResult(statistic=-0.017517266018863276, pvalue=0.2457240701672574)
Correlation between blip-sims and bleurt: PearsonRResult(statistic=0.0679230830053681, pvalue=6.610927221922755e-06)
Correlation between blip-sims and bertscore_p: PearsonRResult(statistic=0.07748918143002405, pvalue=2.7149379872200915e-07)
Correlation between blip-sims and bertscore_r: PearsonRResult(statistic=0.11925164370200289, pvalue=2.1909896050301923e-15)
Correlation between blip-sims and bertscore_f1: PearsonRResult(statistic=0.1264028264079537, pvalue=4.1243908727727453e-17)
Correlation between error occurances and blipscores + lens --------------------------
Correlation between correct and desc len: PearsonRResult(statistic=-0.1313330388930739, pvalue=0.000422153561406421)
Correlation between correct and BLIP prob: PearsonRResult(statistic=0.1263687800472979, pvalue=0.0006952143890300909)
Correlation between correct and BLIP cos sim: PearsonRResult(statistic=0.2374566764238572, pvalue=1.198087749491074e-10)
Correlation between label error and desc len: PearsonRResult(statistic=0.1497019377710952, pvalue=5.715680430938068e-05)
Correlation between label error and BLIP prob: PearsonRResult(statistic=0.05246248742290883, pvalue=0.16052862204642376)
Correlation between label error and BLIP cos sim: PearsonRResult(statistic=-0.032215436831468144, pvalue=0.3890471839398213)
Correlation between identity error and desc len: PearsonRResult(statistic=0.051500012263616425, pvalue=0.16835260769185184)
Correlation between identity error and BLIP prob: PearsonRResult(statistic=-0.08031756272080333, pvalue=0.03152561934699308)
Correlation between identity error and BLIP cos sim: PearsonRResult(statistic=-0.16621861492725667, pvalue=7.669553579963667e-06)
Correlation between value error and desc len: PearsonRResult(statistic=0.0434265926541835, pvalue=0.24550096218360398)
Correlation between value error and BLIP prob: PearsonRResult(statistic=0.03573248580937964, pvalue=0.33935556951051427)
Correlation between value error and BLIP cos sim: PearsonRResult(statistic=-0.06199442978075031, pvalue=0.09717318300463404)
Correlation between deceptive error and desc len: PearsonRResult(statistic=0.20067031414027525, pvalue=5.987915300685724e-08)
Correlation between deceptive error and BLIP prob: PearsonRResult(statistic=0.011525966250119021, pvalue=0.7580055712059328)
Correlation between deceptive error and BLIP cos sim: PearsonRResult(statistic=0.02475069978000779, pvalue=0.5081693118367532)
Correlation between trend error and desc len: PearsonRResult(statistic=-0.03932231530392432, pvalue=0.2930301395222393)
Correlation between trend error and BLIP prob: PearsonRResult(statistic=0.04761081588138853, pvalue=0.20288828070255516)
Correlation between trend error and BLIP cos sim: PearsonRResult(statistic=0.09964715188479659, pvalue=0.0075799180232648915)
Correlation between chart type error and desc len: PearsonRResult(statistic=-0.059439788040734555, pvalue=0.11178075077022183)
Correlation between chart type error and BLIP prob: PearsonRResult(statistic=0.05680761705040406, pvalue=0.12858571433992944)
Correlation between chart type error and BLIP cos sim: PearsonRResult(statistic=0.11043815958234197, pvalue=0.0030655312472440154)
Correlation between cutoff and desc len: PearsonRResult(statistic=0.27707730298324373, pvalue=4.197747444660734e-14)
Correlation between cutoff and BLIP prob: PearsonRResult(statistic=-0.11664571023277268, pvalue=0.0017561628380628404)
Correlation between cutoff and BLIP cos sim: PearsonRResult(statistic=-0.12580089604612754, pvalue=0.0007352219917848294)
Correlation between unecessary context and desc len: PearsonRResult(statistic=0.08987478669551502, pvalue=0.016073670996464333)
Correlation between unecessary context and BLIP prob: PearsonRResult(statistic=-0.08471902171417865, pvalue=0.02329079446561007)
Correlation between unecessary context and BLIP cos sim: PearsonRResult(statistic=0.012984706427215605, pvalue=0.728517994527239)
Correlation between nonsense error and desc len: PearsonRResult(statistic=0.016898847453795163, pvalue=0.6514557584110118)
Correlation between nonsense error and BLIP prob: PearsonRResult(statistic=0.014732708893914353, pvalue=0.6937075227327799)
Correlation between nonsense error and BLIP cos sim: PearsonRResult(statistic=-0.01749483551060644, pvalue=0.6400169312367006)
Correlation between grammar error and desc len: PearsonRResult(statistic=0.060512219708199554, pvalue=0.10545077109809163)
Correlation between grammar error and BLIP prob: PearsonRResult(statistic=0.03606621304248141, pvalue=0.3348611870109457)
Correlation between grammar error and BLIP cos sim: PearsonRResult(statistic=-0.08370337262972588, pvalue=0.025004260694106994)
Correlation between repetition and desc len: PearsonRResult(statistic=0.19229607234639345, pvalue=2.118816852714981e-07)
Correlation between repetition and BLIP prob: PearsonRResult(statistic=-0.008501575277339843, pvalue=0.8202275066928189)
Correlation between repetition and BLIP cos sim: PearsonRResult(statistic=-0.1472140111990357, pvalue=7.601543989290034e-05)
Correlation between missing context and desc len: PearsonRResult(statistic=-0.1613511475072061, pvalue=1.4156781194195227e-05)
Correlation between missing context and BLIP prob: PearsonRResult(statistic=-0.13954107107819608, pvalue=0.000178033564664287)
Correlation between missing context and BLIP cos sim: PearsonRResult(statistic=-0.239189605370972, pvalue=8.699686373800033e-11)
Correlation between axis error and desc len: PearsonRResult(statistic=0.08640268230151574, pvalue=0.02067408971593123)
Correlation between axis error and BLIP prob: PearsonRResult(statistic=-0.009639670500187204, pvalue=0.7966572083847825)
Correlation between axis error and BLIP cos sim: PearsonRResult(statistic=-0.0096292717051898, pvalue=0.7968717700653057)
Correlation between val-correct and desc len: PearsonRResult(statistic=-0.11005179666120075, pvalue=0.003170891030535227)
Correlation between val-correct and BLIP prob: PearsonRResult(statistic=-0.04908849964132993, pvalue=0.18920523525537547)
Correlation between val-correct and BLIP cos sim: PearsonRResult(statistic=-0.0028973990959829804, pvalue=0.9382671293547149)
Correlation between error occurances and human sims --------------------------
Correlation between correct and rouge1_p: PearsonRResult(statistic=0.13322786933053454, pvalue=0.002864422842809617)
Correlation between correct and rouge1_r: PearsonRResult(statistic=0.04579252975915881, pvalue=0.3073028432687153)
Correlation between correct and rouge1_f1: PearsonRResult(statistic=0.16514313685079962, pvalue=0.00021119292254426575)
Correlation between correct and rouge2_p: PearsonRResult(statistic=0.21058958490784652, pvalue=2.0774601731294307e-06)
Correlation between correct and rouge2_r: PearsonRResult(statistic=0.1669486875103909, pvalue=0.00017944591796110012)
Correlation between correct and rouge2_f1: PearsonRResult(statistic=0.24174711565821283, pvalue=4.551738150532204e-08)
Correlation between correct and rougeL_p: PearsonRResult(statistic=0.16590532372571004, pvalue=0.00019719859739626084)
Correlation between correct and rougeL_r: PearsonRResult(statistic=0.07072619647017821, pvalue=0.11458449726715274)
Correlation between correct and rougeL_f1: PearsonRResult(statistic=0.20038312604421854, pvalue=6.457598503718232e-06)
Correlation between correct and rougeLsum_p: PearsonRResult(statistic=0.15690808644520765, pvalue=0.0004346715014188)
Correlation between correct and rougeLsum_r: PearsonRResult(statistic=0.05528629939221126, pvalue=0.21763240983493787)
Correlation between correct and rougeLsum_f1: PearsonRResult(statistic=0.1827265779355272, pvalue=4.022441216778547e-05)
Correlation between correct and bleu: PearsonRResult(statistic=0.19064706270273013, pvalue=1.807454662897648e-05)
Correlation between correct and chrf: PearsonRResult(statistic=0.19261351647806466, pvalue=1.4742310974407448e-05)
Correlation between correct and bleurt: PearsonRResult(statistic=0.033098376822522174, pvalue=0.4606909222035294)
Correlation between correct and bertscore_p: PearsonRResult(statistic=0.09287793857497473, pvalue=0.03807767704987623)
Correlation between correct and bertscore_r: PearsonRResult(statistic=0.12651803909250128, pvalue=0.004647631023470036)
Correlation between correct and bertscore_f1: PearsonRResult(statistic=0.14373823039051306, pvalue=0.0012839198428282218)
Correlation between label error and rouge1_p: PearsonRResult(statistic=-0.19090983937461986, pvalue=1.759109458143567e-05)
Correlation between label error and rouge1_r: PearsonRResult(statistic=0.0783766353133849, pvalue=0.08027339110456294)
Correlation between label error and rouge1_f1: PearsonRResult(statistic=-0.17398912015342688, pvalue=9.356061821842161e-05)
Correlation between label error and rouge2_p: PearsonRResult(statistic=-0.23262663485156618, pvalue=1.4744262212106757e-07)
Correlation between label error and rouge2_r: PearsonRResult(statistic=-0.060013671143153334, pvalue=0.180750315542043)
Correlation between label error and rouge2_f1: PearsonRResult(statistic=-0.2160787810585721, pvalue=1.1024705462752664e-06)
Correlation between label error and rougeL_p: PearsonRResult(statistic=-0.2568534065984788, pvalue=5.841786013107687e-09)
Correlation between label error and rougeL_r: PearsonRResult(statistic=-0.024390829593516725, pvalue=0.5867413306487109)
Correlation between label error and rougeL_f1: PearsonRResult(statistic=-0.26907680172484627, pvalue=1.0046763831467771e-09)
Correlation between label error and rougeLsum_p: PearsonRResult(statistic=-0.2380806433319677, pvalue=7.343026776095199e-08)
Correlation between label error and rougeLsum_r: PearsonRResult(statistic=0.0026356141988325386, pvalue=0.9531690674257544)
Correlation between label error and rougeLsum_f1: PearsonRResult(statistic=-0.2355566232366336, pvalue=1.0160185620516088e-07)
Correlation between label error and bleu: PearsonRResult(statistic=-0.1202290604418374, pvalue=0.007172276724518166)
Correlation between label error and chrf: PearsonRResult(statistic=-0.06827211311877472, pvalue=0.12774965167498054)
Correlation between label error and bleurt: PearsonRResult(statistic=-0.14223763548453414, pvalue=0.001444587575265051)
Correlation between label error and bertscore_p: PearsonRResult(statistic=-0.18753140066264867, pvalue=2.4857464580562693e-05)
Correlation between label error and bertscore_r: PearsonRResult(statistic=-0.12926229794638078, pvalue=0.003823072205312722)
Correlation between label error and bertscore_f1: PearsonRResult(statistic=-0.20675879520474144, pvalue=3.201064751389267e-06)
Correlation between identity error and rouge1_p: PearsonRResult(statistic=0.01947282951075294, pvalue=0.6643317964252385)
Correlation between identity error and rouge1_r: PearsonRResult(statistic=0.010599632401760654, pvalue=0.8132847174887387)
Correlation between identity error and rouge1_f1: PearsonRResult(statistic=0.027752506192794395, pvalue=0.5362402557797697)
Correlation between identity error and rouge2_p: PearsonRResult(statistic=-0.08080190110870905, pvalue=0.0713253022980621)
Correlation between identity error and rouge2_r: PearsonRResult(statistic=-0.11286957606741704, pvalue=0.011633881356993734)
Correlation between identity error and rouge2_f1: PearsonRResult(statistic=-0.1114093666501722, pvalue=0.012766693725612453)
Correlation between identity error and rougeL_p: PearsonRResult(statistic=-0.04962896875215761, pvalue=0.26849769901315373)
Correlation between identity error and rougeL_r: PearsonRResult(statistic=-0.07796289061558499, pvalue=0.08188706012065897)
Correlation between identity error and rougeL_f1: PearsonRResult(statistic=-0.08818514475307401, pvalue=0.048975648449768164)
Correlation between identity error and rougeLsum_p: PearsonRResult(statistic=-0.044444363370049964, pvalue=0.3217775593605133)
Correlation between identity error and rougeLsum_r: PearsonRResult(statistic=-0.07292778051634521, pvalue=0.10370208735201819)
Correlation between identity error and rougeLsum_f1: PearsonRResult(statistic=-0.07921686561929976, pvalue=0.07707562741613308)
Correlation between identity error and bleu: PearsonRResult(statistic=-0.013461250868729341, pvalue=0.7642068826263941)
Correlation between identity error and chrf: PearsonRResult(statistic=-0.06857080562001439, pvalue=0.1260872364709962)
Correlation between identity error and bleurt: PearsonRResult(statistic=0.01936939874311196, pvalue=0.6660067376621811)
Correlation between identity error and bertscore_p: PearsonRResult(statistic=0.007053858571020408, pvalue=0.8751045293353525)
Correlation between identity error and bertscore_r: PearsonRResult(statistic=-0.0946558085860019, pvalue=0.03452415342863532)
Correlation between identity error and bertscore_f1: PearsonRResult(statistic=-0.05767818224598431, pvalue=0.19834789484861332)
Correlation between value error and rouge1_p: PearsonRResult(statistic=-0.017588276047478056, pvalue=0.6951045351780436)
Correlation between value error and rouge1_r: PearsonRResult(statistic=-0.030180750973957646, pvalue=0.5011703947449413)
Correlation between value error and rouge1_f1: PearsonRResult(statistic=-0.008377057798446004, pvalue=0.8519246563284609)
Correlation between value error and rouge2_p: PearsonRResult(statistic=-0.07134119027642857, pvalue=0.1114581163481371)
Correlation between value error and rouge2_r: PearsonRResult(statistic=-0.09871889727906552, pvalue=0.027449094806698434)
Correlation between value error and rouge2_f1: PearsonRResult(statistic=-0.0856540078210162, pvalue=0.05586564652752899)
Correlation between value error and rougeL_p: PearsonRResult(statistic=-0.0694173070928212, pvalue=0.12146711985409905)
Correlation between value error and rougeL_r: PearsonRResult(statistic=-0.09580347335788092, pvalue=0.03238355066040999)
Correlation between value error and rougeL_f1: PearsonRResult(statistic=-0.09874587984832436, pvalue=0.0274066287749858)
Correlation between value error and rougeLsum_p: PearsonRResult(statistic=-0.07897847967862615, pvalue=0.07797218316873097)
Correlation between value error and rougeLsum_r: PearsonRResult(statistic=-0.1110888130189134, pvalue=0.013028027670664204)
Correlation between value error and rougeLsum_f1: PearsonRResult(statistic=-0.11575844213154748, pvalue=0.009651584155379575)
Correlation between value error and bleu: PearsonRResult(statistic=0.009708751462995933, pvalue=0.8287240786288186)
Correlation between value error and chrf: PearsonRResult(statistic=0.005881480042360925, pvalue=0.8957326281683717)
Correlation between value error and bleurt: PearsonRResult(statistic=0.021328570348684843, pvalue=0.6345707594819264)
Correlation between value error and bertscore_p: PearsonRResult(statistic=0.05466949962711548, pvalue=0.2228163992065775)
Correlation between value error and bertscore_r: PearsonRResult(statistic=-0.013128336238134142, pvalue=0.7698722939375447)
Correlation between value error and bertscore_f1: PearsonRResult(statistic=0.028059031906569113, pvalue=0.5317462944200255)
Correlation between deceptive error and rouge1_p: PearsonRResult(statistic=-0.15095242584696345, pvalue=0.0007170656652730681)
Correlation between deceptive error and rouge1_r: PearsonRResult(statistic=0.05265934016663194, pvalue=0.24032082909421)
Correlation between deceptive error and rouge1_f1: PearsonRResult(statistic=-0.12789256456685041, pvalue=0.00421644207956471)
Correlation between deceptive error and rouge2_p: PearsonRResult(statistic=-0.2209548865028594, pvalue=6.192182849619934e-07)
Correlation between deceptive error and rouge2_r: PearsonRResult(statistic=-0.1111048473900749, pvalue=0.01301484429255376)
Correlation between deceptive error and rouge2_f1: PearsonRResult(statistic=-0.2197498250596661, pvalue=7.149782291123501e-07)
Correlation between deceptive error and rougeL_p: PearsonRResult(statistic=-0.21053334980851687, pvalue=2.090808801668583e-06)
Correlation between deceptive error and rougeL_r: PearsonRResult(statistic=-0.0417374278398127, pvalue=0.3521584503359701)
Correlation between deceptive error and rougeL_f1: PearsonRResult(statistic=-0.21922753550144802, pvalue=7.607634198127969e-07)
Correlation between deceptive error and rougeLsum_p: PearsonRResult(statistic=-0.18976803998873823, pvalue=1.9785115361454484e-05)
Correlation between deceptive error and rougeLsum_r: PearsonRResult(statistic=-0.010584717034175965, pvalue=0.8135426306870216)
Correlation between deceptive error and rougeLsum_f1: PearsonRResult(statistic=-0.18146747943551422, pvalue=4.5539785119650817e-05)
Correlation between deceptive error and bleu: PearsonRResult(statistic=-0.1065348604286528, pvalue=0.017283594780599838)
Correlation between deceptive error and chrf: PearsonRResult(statistic=-0.03796788919696262, pvalue=0.39737618630555316)
Correlation between deceptive error and bleurt: PearsonRResult(statistic=-0.08456206613916234, pvalue=0.05907722368063398)
Correlation between deceptive error and bertscore_p: PearsonRResult(statistic=-0.13799077591104225, pvalue=0.0020046520619478947)
Correlation between deceptive error and bertscore_r: PearsonRResult(statistic=-0.10531181617425239, pvalue=0.018615889071432964)
Correlation between deceptive error and bertscore_f1: PearsonRResult(statistic=-0.15797294849649002, pvalue=0.000396706965996752)
Correlation between trend error and rouge1_p: PearsonRResult(statistic=-0.005190527419344153, pvalue=0.9079239160886066)
Correlation between trend error and rouge1_r: PearsonRResult(statistic=-0.014143269689204648, pvalue=0.7526400691563235)
Correlation between trend error and rouge1_f1: PearsonRResult(statistic=-0.0019653767968799783, pvalue=0.9650693037717939)
Correlation between trend error and rouge2_p: PearsonRResult(statistic=-0.03983609562484251, pvalue=0.3745451760758809)
Correlation between trend error and rouge2_r: PearsonRResult(statistic=-0.05048695791807818, pvalue=0.260299809779606)
Correlation between trend error and rouge2_f1: PearsonRResult(statistic=-0.047139224386028786, pvalue=0.2932797307407222)
Correlation between trend error and rougeL_p: PearsonRResult(statistic=-0.02011336153447547, pvalue=0.6539965874530833)
Correlation between trend error and rougeL_r: PearsonRResult(statistic=-0.021342279515194802, pvalue=0.6343530029529775)
Correlation between trend error and rougeL_f1: PearsonRResult(statistic=-0.02256284015556881, pvalue=0.6150938809192584)
Correlation between trend error and rougeLsum_p: PearsonRResult(statistic=-0.014714036880904016, pvalue=0.743002303231755)
Correlation between trend error and rougeLsum_r: PearsonRResult(statistic=-0.016041898003853975, pvalue=0.7207376182118842)
Correlation between trend error and rougeLsum_f1: PearsonRResult(statistic=-0.013948866378414075, pvalue=0.7559315744240782)
Correlation between trend error and bleu: PearsonRResult(statistic=-0.010364990685726078, pvalue=0.8173444111454393)
Correlation between trend error and chrf: PearsonRResult(statistic=-0.015190148219421074, pvalue=0.7349933954670784)
Correlation between trend error and bleurt: PearsonRResult(statistic=0.04751611401857732, pvalue=0.28943309486948215)
Correlation between trend error and bertscore_p: PearsonRResult(statistic=0.0033240065815657235, pvalue=0.9409573369698878)
Correlation between trend error and bertscore_r: PearsonRResult(statistic=0.03700847869784114, pvalue=0.4094192198011103)
Correlation between trend error and bertscore_f1: PearsonRResult(statistic=0.02531015908562373, pvalue=0.5727142791723467)
Correlation between chart type error and rouge1_p: PearsonRResult(statistic=-0.02882013150078772, pvalue=0.5206707333528934)
Correlation between chart type error and rouge1_r: PearsonRResult(statistic=-0.08897439652386349, pvalue=0.04697847793527142)
Correlation between chart type error and rouge1_f1: PearsonRResult(statistic=-0.08789810134331218, pvalue=0.04971940333651433)
Correlation between chart type error and rouge2_p: PearsonRResult(statistic=-0.05408543754733526, pvalue=0.22780590457516633)
Correlation between chart type error and rouge2_r: PearsonRResult(statistic=-0.09757997852650654, pvalue=0.029293910500548987)
Correlation between chart type error and rouge2_f1: PearsonRResult(statistic=-0.09393186820217983, pvalue=0.035935503219132775)
Correlation between chart type error and rougeL_p: PearsonRResult(statistic=0.0001839697675159145, pvalue=0.9967292673605229)
Correlation between chart type error and rougeL_r: PearsonRResult(statistic=-0.03185941039442374, pvalue=0.47765406156139006)
Correlation between chart type error and rougeL_f1: PearsonRResult(statistic=-0.03109848774724162, pvalue=0.4882386139827876)
Correlation between chart type error and rougeLsum_p: PearsonRResult(statistic=0.002364871077584794, pvalue=0.9579750771256883)
Correlation between chart type error and rougeLsum_r: PearsonRResult(statistic=-0.02756536540713585, pvalue=0.5389932579360297)
Correlation between chart type error and rougeLsum_f1: PearsonRResult(statistic=-0.026339440407934848, pvalue=0.5572004947127757)
Correlation between chart type error and bleu: PearsonRResult(statistic=-0.2060924022563633, pvalue=3.4482628834012474e-06)
Correlation between chart type error and chrf: PearsonRResult(statistic=-0.15493191696550607, pvalue=0.0005142318937486797)
Correlation between chart type error and bleurt: PearsonRResult(statistic=-0.1237468373060529, pvalue=0.005639994982153319)
Correlation between chart type error and bertscore_p: PearsonRResult(statistic=-0.016368260564661223, pvalue=0.715300499938717)
Correlation between chart type error and bertscore_r: PearsonRResult(statistic=-0.07651262697268114, pvalue=0.08775149455860567)
Correlation between chart type error and bertscore_f1: PearsonRResult(statistic=-0.060283108128237756, pvalue=0.17879696753245328)
Correlation between cutoff and rouge1_p: PearsonRResult(statistic=-0.17522832061146873, pvalue=8.320589264709359e-05)
Correlation between cutoff and rouge1_r: PearsonRResult(statistic=0.13753138765980333, pvalue=0.002075867359514118)
Correlation between cutoff and rouge1_f1: PearsonRResult(statistic=-0.147777515553889, pvalue=0.0009295755392226992)
Correlation between cutoff and rouge2_p: PearsonRResult(statistic=-0.13273657818826323, pvalue=0.002969959358616858)
Correlation between cutoff and rouge2_r: PearsonRResult(statistic=0.09067584058682256, pvalue=0.042904611077784156)
Correlation between cutoff and rouge2_f1: PearsonRResult(statistic=-0.08267763957984248, pvalue=0.06497832195471374)
Correlation between cutoff and rougeL_p: PearsonRResult(statistic=-0.18165733449262017, pvalue=4.4697846872062524e-05)
Correlation between cutoff and rougeL_r: PearsonRResult(statistic=0.10263285631593108, pvalue=0.021850740067614057)
Correlation between cutoff and rougeL_f1: PearsonRResult(statistic=-0.14357936281094752, pvalue=0.0013001158464102662)
Correlation between cutoff and rougeLsum_p: PearsonRResult(statistic=-0.1659798355106742, pvalue=0.0001958781503998304)
Correlation between cutoff and rougeLsum_r: PearsonRResult(statistic=0.12516123620263947, pvalue=0.005111905523886034)
Correlation between cutoff and rougeLsum_f1: PearsonRResult(statistic=-0.11664059651594441, pvalue=0.009109257448628682)
Correlation between cutoff and bleu: PearsonRResult(statistic=-0.09925476228541918, pvalue=0.02661625214536872)
Correlation between cutoff and chrf: PearsonRResult(statistic=-0.043736635927055144, pvalue=0.3295509203691574)
Correlation between cutoff and bleurt: PearsonRResult(statistic=-0.015571752462832902, pvalue=0.7285948680792758)
Correlation between cutoff and bertscore_p: PearsonRResult(statistic=-0.2621616971629473, pvalue=2.749936055227883e-09)
Correlation between cutoff and bertscore_r: PearsonRResult(statistic=-0.02907326526164567, pvalue=0.5170135523071597)
Correlation between cutoff and bertscore_f1: PearsonRResult(statistic=-0.19201614191261518, pvalue=1.5687213672177957e-05)
Correlation between unecessary context and rouge1_p: PearsonRResult(statistic=-0.08514028308461323, pvalue=0.05735806407214493)
Correlation between unecessary context and rouge1_r: PearsonRResult(statistic=0.1227434869491796, pvalue=0.006043838373512834)
Correlation between unecessary context and rouge1_f1: PearsonRResult(statistic=-0.039910429213353124, pvalue=0.37365380944394644)
Correlation between unecessary context and rouge2_p: PearsonRResult(statistic=-0.05909083473081722, pvalue=0.18756036218345312)
Correlation between unecessary context and rouge2_r: PearsonRResult(statistic=0.08750911511464038, pvalue=0.05074235040613523)
Correlation between unecessary context and rouge2_f1: PearsonRResult(statistic=-0.014201565941990804, pvalue=0.7516539066154613)
Correlation between unecessary context and rougeL_p: PearsonRResult(statistic=-0.051437821757171646, pvalue=0.2514181630041471)
Correlation between unecessary context and rougeL_r: PearsonRResult(statistic=0.14440332995092606, pvalue=0.0012181109736573244)
Correlation between unecessary context and rougeL_f1: PearsonRResult(statistic=0.02464219376395684, pvalue=0.5828903019812615)
Correlation between unecessary context and rougeLsum_p: PearsonRResult(statistic=-0.04050829586906031, pvalue=0.3665322155690265)
Correlation between unecessary context and rougeLsum_r: PearsonRResult(statistic=0.1599924264315638, pvalue=0.0003330498667554696)
Correlation between unecessary context and rougeLsum_f1: PearsonRResult(statistic=0.042846655634683144, pvalue=0.3394967369497884)
Correlation between unecessary context and bleu: PearsonRResult(statistic=0.0062341768872653976, pvalue=0.889518850748974)
Correlation between unecessary context and chrf: PearsonRResult(statistic=0.03773258896911712, pvalue=0.4003099158925486)
Correlation between unecessary context and bleurt: PearsonRResult(statistic=0.08429990309002844, pvalue=0.0598706224554135)
Correlation between unecessary context and bertscore_p: PearsonRResult(statistic=-0.06490811376113476, pvalue=0.147665100829348)
Correlation between unecessary context and bertscore_r: PearsonRResult(statistic=0.042766790135150626, pvalue=0.3403985486339362)
Correlation between unecessary context and bertscore_f1: PearsonRResult(statistic=-0.01470760941986352, pvalue=0.7431106149538851)
Correlation between nonsense error and rouge1_p: PearsonRResult(statistic=-0.02969013144820898, pvalue=0.5081571744669561)
Correlation between nonsense error and rouge1_r: PearsonRResult(statistic=0.014004158218377624, pvalue=0.7549949579982493)
Correlation between nonsense error and rouge1_f1: PearsonRResult(statistic=-0.009571951107942877, pvalue=0.8311009017541179)
Correlation between nonsense error and rouge2_p: PearsonRResult(statistic=-0.04463246994108254, pvalue=0.31973172038904407)
Correlation between nonsense error and rouge2_r: PearsonRResult(statistic=-0.016637776383025424, pvalue=0.7108212507579073)
Correlation between nonsense error and rouge2_f1: PearsonRResult(statistic=-0.03417178503219082, pvalue=0.44627015853802265)
Correlation between nonsense error and rougeL_p: PearsonRResult(statistic=-0.031935260108103616, pvalue=0.4766058822756493)
Correlation between nonsense error and rougeL_r: PearsonRResult(statistic=0.008093503508738279, pvalue=0.8568819570446042)
Correlation between nonsense error and rougeL_f1: PearsonRResult(statistic=-0.01203814729124093, pvalue=0.7885090789358699)
Correlation between nonsense error and rougeLsum_p: PearsonRResult(statistic=-0.05392699351531422, pvalue=0.22917303788639828)
Correlation between nonsense error and rougeLsum_r: PearsonRResult(statistic=-0.019670688069447113, pvalue=0.6611323648214039)
Correlation between nonsense error and rougeLsum_f1: PearsonRResult(statistic=-0.049147028205622625, pvalue=0.27317927118595803)
Correlation between nonsense error and bleu: PearsonRResult(statistic=0.013414773328875161, pvalue=0.7649970716255624)
Correlation between nonsense error and chrf: PearsonRResult(statistic=0.04919078912691347, pvalue=0.27275189442385145)
Correlation between nonsense error and bleurt: PearsonRResult(statistic=0.012285372264354272, pvalue=0.7842717508487885)
Correlation between nonsense error and bertscore_p: PearsonRResult(statistic=0.01846494505695768, pvalue=0.6807233436149287)
Correlation between nonsense error and bertscore_r: PearsonRResult(statistic=0.007550208364048469, pvalue=0.8663958134855262)
Correlation between nonsense error and bertscore_f1: PearsonRResult(statistic=0.017410415297881794, pvalue=0.6980358261595548)
Correlation between grammar error and rouge1_p: PearsonRResult(statistic=-0.09441074746205352, pvalue=0.034996530068141726)
Correlation between grammar error and rouge1_r: PearsonRResult(statistic=0.013507409406012902, pvalue=0.7634223593652656)
Correlation between grammar error and rouge1_f1: PearsonRResult(statistic=-0.07672669320168472, pvalue=0.0868652086249439)
Correlation between grammar error and rouge2_p: PearsonRResult(statistic=-0.08300680232664681, pvalue=0.06391393763584387)
Correlation between grammar error and rouge2_r: PearsonRResult(statistic=0.005587325651798652, pvalue=0.9009199088478216)
Correlation between grammar error and rouge2_f1: PearsonRResult(statistic=-0.05456329058599491, pvalue=0.22371786887201778)
Correlation between grammar error and rougeL_p: PearsonRResult(statistic=-0.08773544666907371, pvalue=0.05014503185036567)
Correlation between grammar error and rougeL_r: PearsonRResult(statistic=0.017123886400226395, pvalue=0.702767500262946)
Correlation between grammar error and rougeL_f1: PearsonRResult(statistic=-0.059052704427115826, pvalue=0.18784575166528)
Correlation between grammar error and rougeLsum_p: PearsonRResult(statistic=-0.08534672299961218, pvalue=0.05675441151726239)
Correlation between grammar error and rougeLsum_r: PearsonRResult(statistic=0.02041213757012436, pvalue=0.6491981085686369)
Correlation between grammar error and rougeLsum_f1: PearsonRResult(statistic=-0.05494386836218995, pvalue=0.2204996463961402)
Correlation between grammar error and bleu: PearsonRResult(statistic=0.0735130402689007, pvalue=0.1009514320642573)
Correlation between grammar error and chrf: PearsonRResult(statistic=0.05120856119215832, pvalue=0.25354006411629726)
Correlation between grammar error and bleurt: PearsonRResult(statistic=-0.03747340980368923, pvalue=0.4035563443153475)
Correlation between grammar error and bertscore_p: PearsonRResult(statistic=-0.02835408296677728, pvalue=0.527438608379933)
Correlation between grammar error and bertscore_r: PearsonRResult(statistic=-0.03888923798383329, pvalue=0.38601375511284813)
Correlation between grammar error and bertscore_f1: PearsonRResult(statistic=-0.043041046734994926, pvalue=0.3373081352149738)
Correlation between repetition and rouge1_p: PearsonRResult(statistic=0.0692399897156282, pvalue=0.12242380608180214)
Correlation between repetition and rouge1_r: PearsonRResult(statistic=-0.06000304193916435, pvalue=0.1808276971545328)
Correlation between repetition and rouge1_f1: PearsonRResult(statistic=0.05323482440974482, pvalue=0.23521365833115482)
Correlation between repetition and rouge2_p: PearsonRResult(statistic=0.0604030588402463, pvalue=0.17793240961130216)
Correlation between repetition and rouge2_r: PearsonRResult(statistic=-0.028188858927774285, pvalue=0.5298486583007904)
Correlation between repetition and rouge2_f1: PearsonRResult(statistic=0.04240407010674205, pvalue=0.3445134832287201)
Correlation between repetition and rougeL_p: PearsonRResult(statistic=0.058854774546314034, pvalue=0.1893323098244929)
Correlation between repetition and rougeL_r: PearsonRResult(statistic=-0.06360791536014904, pvalue=0.15596703065160092)
Correlation between repetition and rougeL_f1: PearsonRResult(statistic=0.03007570957658923, pvalue=0.502661979960852)
Correlation between repetition and rougeLsum_p: PearsonRResult(statistic=0.05398301689866482, pvalue=0.22868897673043273)
Correlation between repetition and rougeLsum_r: PearsonRResult(statistic=-0.06924081595157669, pvalue=0.12241933469097228)
Correlation between repetition and rougeLsum_f1: PearsonRResult(statistic=0.022409007877459378, pvalue=0.6175070385325847)
Correlation between repetition and bleu: PearsonRResult(statistic=-0.05267241251680717, pvalue=0.24020395973144468)
Correlation between repetition and chrf: PearsonRResult(statistic=-0.03254611682198194, pvalue=0.46821031898192317)
Correlation between repetition and bleurt: PearsonRResult(statistic=-0.03236707007003845, pvalue=0.470662629231897)
Correlation between repetition and bertscore_p: PearsonRResult(statistic=-0.004989838746733904, pvalue=0.9114690876680738)
Correlation between repetition and bertscore_r: PearsonRResult(statistic=-0.0040393565281438976, pvalue=0.9282820789239927)
Correlation between repetition and bertscore_f1: PearsonRResult(statistic=-0.005458535987737772, pvalue=0.9031923986048931)
Correlation between missing context and rouge1_p: PearsonRResult(statistic=-0.04310806778715585, pvalue=0.3365556622696367)
Correlation between missing context and rouge1_r: PearsonRResult(statistic=-0.12162851320856732, pvalue=0.0065229167165791035)
Correlation between missing context and rouge1_f1: PearsonRResult(statistic=-0.11556371369594247, pvalue=0.009775096131717132)
Correlation between missing context and rouge2_p: PearsonRResult(statistic=-0.06613578233276382, pvalue=0.14013945447241385)
Correlation between missing context and rouge2_r: PearsonRResult(statistic=-0.13056567953335282, pvalue=0.003479940457864281)
Correlation between missing context and rouge2_f1: PearsonRResult(statistic=-0.11794130019683506, pvalue=0.008359062882789712)
Correlation between missing context and rougeL_p: PearsonRResult(statistic=-0.03770188685147977, pvalue=0.4006936656071838)
Correlation between missing context and rougeL_r: PearsonRResult(statistic=-0.09050870819972895, pvalue=0.04329113163487766)
Correlation between missing context and rougeL_f1: PearsonRResult(statistic=-0.09895005348881761, pvalue=0.027087121535754002)
Correlation between missing context and rougeLsum_p: PearsonRResult(statistic=-0.031570938287828074, pvalue=0.4816519613427791)
Correlation between missing context and rougeLsum_r: PearsonRResult(statistic=-0.08119014712255318, pvalue=0.06997158531215208)
Correlation between missing context and rougeLsum_f1: PearsonRResult(statistic=-0.08712160391017115, pvalue=0.05177883627842077)
Correlation between missing context and bleu: PearsonRResult(statistic=-0.09510521151821565, pvalue=0.0336719940000474)
Correlation between missing context and chrf: PearsonRResult(statistic=-0.20682766240081024, pvalue=3.1765072405769146e-06)
Correlation between missing context and bleurt: PearsonRResult(statistic=-0.055468045845525615, pvalue=0.21612151674359467)
Correlation between missing context and bertscore_p: PearsonRResult(statistic=-0.0688662569978645, pvalue=0.12445943512204723)
Correlation between missing context and bertscore_r: PearsonRResult(statistic=-0.11326390092494573, pvalue=0.011343634226632624)
Correlation between missing context and bertscore_f1: PearsonRResult(statistic=-0.11949433175307572, pvalue=0.007535854167056175)
Correlation between axis error and rouge1_p: PearsonRResult(statistic=-0.08987095655458245, pvalue=0.04479309243469251)
Correlation between axis error and rouge1_r: PearsonRResult(statistic=-0.013916195174007882, pvalue=0.7564851766831144)
Correlation between axis error and rouge1_f1: PearsonRResult(statistic=-0.08685674379237407, pvalue=0.05249737756213238)
Correlation between axis error and rouge2_p: PearsonRResult(statistic=-0.10369749034587411, pvalue=0.02051102115644879)
Correlation between axis error and rouge2_r: PearsonRResult(statistic=-0.04963838295549997, pvalue=0.2684067998094685)
Correlation between axis error and rouge2_f1: PearsonRResult(statistic=-0.09525405290644177, pvalue=0.03339374314153962)
Correlation between axis error and rougeL_p: PearsonRResult(statistic=-0.0741519680734189, pvalue=0.09801504805880518)
Correlation between axis error and rougeL_r: PearsonRResult(statistic=0.003979719039349568, pvalue=0.9293381449780413)
Correlation between axis error and rougeL_f1: PearsonRResult(statistic=-0.05363675123622785, pvalue=0.23169247007217997)
Correlation between axis error and rougeLsum_p: PearsonRResult(statistic=-0.08233737883266226, pvalue=0.06609387333429469)
Correlation between axis error and rougeLsum_r: PearsonRResult(statistic=-0.006971729635435986, pvalue=0.876547008757725)
Correlation between axis error and rougeLsum_f1: PearsonRResult(statistic=-0.06766843519175232, pvalue=0.13116133830330773)
Correlation between axis error and bleu: PearsonRResult(statistic=-0.004621271003418276, pvalue=0.9179843759059017)
Correlation between axis error and chrf: PearsonRResult(statistic=-0.00628982637336354, pvalue=0.8885390315160897)
Correlation between axis error and bleurt: PearsonRResult(statistic=-0.012019869033989203, pvalue=0.7888226097836953)
Correlation between axis error and bertscore_p: PearsonRResult(statistic=-0.02017258762126521, pvalue=0.6530442477526194)
Correlation between axis error and bertscore_r: PearsonRResult(statistic=-0.06409565967736247, pvalue=0.15281228138881173)
Correlation between axis error and bertscore_f1: PearsonRResult(statistic=-0.054683815948738126, pvalue=0.22269508519574627)
Correlation between val-correct and rouge1_p: PearsonRResult(statistic=0.09375176448459728, pvalue=0.03629414580022827)
Correlation between val-correct and rouge1_r: PearsonRResult(statistic=0.029119641189100945, pvalue=0.5163449710092368)
Correlation between val-correct and rouge1_f1: PearsonRResult(statistic=0.10987483179983179, pvalue=0.014060989931791725)
Correlation between val-correct and rouge2_p: PearsonRResult(statistic=0.18750947719215377, pvalue=2.4912807208136407e-05)
Correlation between val-correct and rouge2_r: PearsonRResult(statistic=0.16384909370881331, pvalue=0.0002371023366562954)
Correlation between val-correct and rouge2_f1: PearsonRResult(statistic=0.2178269120135821, pvalue=8.978700886544616e-07)
Correlation between val-correct and rougeL_p: PearsonRResult(statistic=0.1420267756425402, pvalue=0.0014685892215223328)
Correlation between val-correct and rougeL_r: PearsonRResult(statistic=0.08080919844541909, pvalue=0.0712996623764908)
Correlation between val-correct and rougeL_f1: PearsonRResult(statistic=0.17762849363337668, pvalue=6.614584062485203e-05)
Correlation between val-correct and rougeLsum_p: PearsonRResult(statistic=0.13107198327071218, pvalue=0.0033543648044577247)
Correlation between val-correct and rougeLsum_r: PearsonRResult(statistic=0.0625353465106541, pvalue=0.16307740000085685)
Correlation between val-correct and rougeLsum_f1: PearsonRResult(statistic=0.15651419430605967, pvalue=0.00044955176123678175)
Correlation between val-correct and bleu: PearsonRResult(statistic=0.16241263959787156, pvalue=0.00026933126287363084)
Correlation between val-correct and chrf: PearsonRResult(statistic=0.13588985759280242, pvalue=0.0023496576576675808)
Correlation between val-correct and bleurt: PearsonRResult(statistic=0.053163083713608575, pvalue=0.23584611324690258)
Correlation between val-correct and bertscore_p: PearsonRResult(statistic=0.03769560235275336, pvalue=0.4007722435229063)
Correlation between val-correct and bertscore_r: PearsonRResult(statistic=0.11641429477762275, pvalue=0.00924573165553099)
Correlation between val-correct and bertscore_f1: PearsonRResult(statistic=0.1005432570108725, pvalue=0.024702240492286122)
