@@ -52,16 +52,28 @@ To build the library from sources:
5252 <dependency >
5353 <groupId >de.unknownreality</groupId >
5454 <artifactId >dataframe</artifactId >
55- <version >0.7.2 -SNAPSHOT</version >
55+ <version >0.7.5 -SNAPSHOT</version >
5656 </dependency >
5757...
5858</dependencies >
5959```
60+ Version 0.7.5
61+ -----
62+ - ** direct value access for DataRow object.**
63+
64+ DataRows now directly access the respective values from the columns.
65+ This improves runtime and memory footprint for most DataFrame operations.
66+ DataRow objects are invalidated once the source DataFrame is changed.
67+ Accessing an invalidated row results in an exception
68+ - Row collections are now return as DataRows object.
69+ DataRows can be converted to a new DataFrame
70+ - improved 'groupBy' method
71+
6072
6173Version 0.7
6274-----
6375- The read and write functions have been rewritten from scratch for this version.
64- Some existing methods have beed removed.
76+ Some existing methods have been removed.
6577
6678- Data grouping has been refactored and aggregation functions can now be applied to data groupings.
6779In general, data groupings can now be used like normal DataFrames.
@@ -70,6 +82,67 @@ In general, data groupings can now be used like normal DataFrames.
7082
7183- Empty DataFrame instances are now created using DataFrame.create()
7284
85+ Examples
86+ -----
87+ Select all users called Meier or Schmitt from Germany, group by age and add column that contains the number of users with the respective age.
88+ Then sort by age and print
89+ ``` java
90+ URL csvUrl = new URL (" https://raw.githubusercontent.com/nRo/DataFrame/master/src/test/resources/users.csv" );
91+
92+ DataFrame users = DataFrame . load(csvUrl, FileFormat . CSV );
93+
94+ users. select(" (name == 'Schmitt' || name == 'Meier') && country == 'Germany'" )
95+ .groupBy(" age" ). agg(" count" ,Aggregate . count())
96+ .sort(" age" )
97+ .print();
98+
99+ /*
100+ age count
101+ 20 1
102+ 24 2
103+ 30 2
104+ */
105+
106+ ```
107+ Load a csv file, set a unique column as primary key and add an index for two other columns.
108+ Select rows using the previously created index, change the values in their NAME column and join them with the original DataFrame.
109+ ``` java
110+ URL csvUrl = new URL (" https://raw.githubusercontent.com/nRo/DataFrame/master/src/test/resources/data_index.csv" );
111+
112+ DataFrame dataFrame = DataFrame . load(csvUrl, FileFormat . CSV );
113+
114+ dataFrame. setPrimaryKey(" UID" );
115+ dataFrame. addIndex(" id_name_idx" ," ID" ," NAME" );
116+
117+ DataRow row = dataFrame. selectByPrimaryKey(1 );
118+ System . out. println(row);
119+ // 1;A;1
120+
121+ DataFrame idxExample = dataFrame. selectByIndex(" id_name_idx" ,3 ," A" );
122+
123+ idxExample. print();
124+ /*
125+ ID NAME UID
126+ 3 A 4
127+ 3 A 8
128+ */
129+ idxExample. getStringColumn(" NAME" ). map((value - > value + " _idx_example" ));
130+ idxExample. print();
131+ /*
132+ ID NAME UID
133+ 3 A_idx_example 4
134+ 3 A_idx_example 8
135+ */
136+
137+ dataFrame. joinInner(idxExample," UID" ). print();
138+ /*
139+ ID.A NAME.A UID ID.B NAME.B
140+ 3 A 4 3 A_idx_example
141+ 3 A 8 3 A_idx_example
142+ */
143+
144+ ```
145+
73146Usage
74147-----
75148Load DataFrame from a CSV file.
@@ -123,20 +196,74 @@ File file = new File("dataFrame.csv");
123196dataFrame. write(file);
124197DataFrame loadedDataFrame = DataFrame . load(file);
125198```
199+
200+ Values within a DataFrame are accessed using DataRow objects.
201+ If the source DataFrame changes after a DataRow object is created, the DataRow is invalidated and can no
202+ longer be accessed.
203+
204+ ``` java
205+
206+ for (DataRow row : dataFrame){
207+ ... = row. getInteger(" id" );
208+ }
209+
210+
211+ DataRows rows = dataFrame. getRows();
212+
213+ // returns the value within the id column in the first row
214+ rows. get(0 ). getInteger(" id" );
215+
216+ dataFrame. sort(" name" );
217+
218+ // The DataFrame was sorted after the DataRows were obtained.
219+ // The first row can now differ.
220+ // To avoid these effects, a RuntimeException is thrown
221+ // if a row that was created before the DataFrame is altered is accessed
222+
223+ rows. get(0 ). getInteger(" id" ); // throws exception
224+
225+ rows = dataFrame. getRows();
226+
227+ // rows is now valid again and rows can be accessed
228+ rows. get(0 ). getInteger(" id" );
229+
230+
231+ // DataRows can be converted to a new independent DataFrame.
232+ // changes to the original DataFrame have no effect on the new DataFrame.
233+ DataFrame dataFrame2 = rows. toDataFrame();
234+
235+ dataFrame. sort(" id" );
236+
237+ dataFrame. getRow(0 ). getInteger(" id" ); // no exception
238+ ```
239+
240+ DataRows can be used to change values within a DataFrame
241+
242+ ``` java
243+ DataRows rows = dataFrame. getRows();
244+
245+ // sets the value in the second row in the name column to 'A'
246+ rows. get(1 ). set(" name" ," A" );
247+
248+ // sets the value in the second row in the first column to 'A'
249+ rows. get(1 ). set(0 ," A" );
250+
251+
252+ ```
253+
126254Use indices for fast row access.
127- Indices must always be unique.
128255``` java
129256
130257// set the primary key of a data frame
131258users. setPrimaryKey(" person_id" );
132- DataRow firstUser = users. findByPrimaryKey (1 )
259+ DataRow firstUser = users. selectByPrimaryKey (1 )
133260
134261// add a multi-column index
135262
136263users. addIndex(" name-address" ," last_name" ," address" );
137264
138- // returns all users with the last name Smith in the Example-Street 15
139- List< DataRow > user = users. findByIndex (" name-address" ," Smith" ," Example-Street 15" )
265+ // returns rows containing all users with the last name Smith in the Example-Street 15
266+ DataRows user = users. selectRowsByIndex (" name-address" ," Smith" ," Example-Street 15" )
140267```
141268It is possible to define and use other index types.
142269The following example shows interval indices.
@@ -157,17 +284,17 @@ IntervalIndex index = new IntervalIndex("idx",
157284 dataFrame. getNumberColumn(" end" ));
158285dataFrame. addIndex(index);
159286
160- // returns rows where (start,end) overlaps with (1,3)
287+ // returns a new dataframe containing all rows where (start,end) overlaps with (1,3)
161288// -> A, B
162- dataFrame. findByIndex (" idx" ,1 ,3 );
289+ DataFrame df = dataFrame. selectByIndex (" idx" ,1 ,3 );
163290
164- // returns rows where (start,end) overlaps with (4,5)
291+ // rows where (start,end) overlaps with (4,5)
165292// -> C
166- dataFrame. findByIndex (" idx" ,4 ,5 );
293+ dataFrame. selectByIndex (" idx" ,4 ,5 );
167294
168- // returns rows where (start,end) contains 2.5
295+ // rows where (start,end) contains 2.5
169296// -> A, B
170- dataFrame. findByIndex (" idx" ,2.5 );
297+ dataFrame. selectByIndex (" idx" ,2.5 );
171298```
172299
173300Perform operations on columns.
0 commit comments