Skip to content

Commit 91d66bb

Browse files
authored
Merge pull request #7 from nRo/develop
Develop
2 parents a2b2aad + 51565b8 commit 91d66bb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+1478
-1220
lines changed

README.md

Lines changed: 139 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -52,16 +52,28 @@ To build the library from sources:
5252
<dependency>
5353
<groupId>de.unknownreality</groupId>
5454
<artifactId>dataframe</artifactId>
55-
<version>0.7.2-SNAPSHOT</version>
55+
<version>0.7.5-SNAPSHOT</version>
5656
</dependency>
5757
...
5858
</dependencies>
5959
```
60+
Version 0.7.5
61+
-----
62+
- **direct value access for DataRow object.**
63+
64+
DataRows now directly access the respective values from the columns.
65+
This improves runtime and memory footprint for most DataFrame operations.
66+
DataRow objects are invalidated once the source DataFrame is changed.
67+
Accessing an invalidated row results in an exception
68+
- Row collections are now return as DataRows object.
69+
DataRows can be converted to a new DataFrame
70+
- improved 'groupBy' method
71+
6072

6173
Version 0.7
6274
-----
6375
- The read and write functions have been rewritten from scratch for this version.
64-
Some existing methods have beed removed.
76+
Some existing methods have been removed.
6577

6678
- Data grouping has been refactored and aggregation functions can now be applied to data groupings.
6779
In general, data groupings can now be used like normal DataFrames.
@@ -70,6 +82,67 @@ In general, data groupings can now be used like normal DataFrames.
7082

7183
- Empty DataFrame instances are now created using DataFrame.create()
7284

85+
Examples
86+
-----
87+
Select all users called Meier or Schmitt from Germany, group by age and add column that contains the number of users with the respective age.
88+
Then sort by age and print
89+
```java
90+
URL csvUrl = new URL("https://raw.githubusercontent.com/nRo/DataFrame/master/src/test/resources/users.csv");
91+
92+
DataFrame users = DataFrame.load(csvUrl, FileFormat.CSV);
93+
94+
users.select("(name == 'Schmitt' || name == 'Meier') && country == 'Germany'")
95+
.groupBy("age").agg("count",Aggregate.count())
96+
.sort("age")
97+
.print();
98+
99+
/*
100+
age count
101+
20 1
102+
24 2
103+
30 2
104+
*/
105+
106+
```
107+
Load a csv file, set a unique column as primary key and add an index for two other columns.
108+
Select rows using the previously created index, change the values in their NAME column and join them with the original DataFrame.
109+
```java
110+
URL csvUrl = new URL("https://raw.githubusercontent.com/nRo/DataFrame/master/src/test/resources/data_index.csv");
111+
112+
DataFrame dataFrame = DataFrame.load(csvUrl, FileFormat.CSV);
113+
114+
dataFrame.setPrimaryKey("UID");
115+
dataFrame.addIndex("id_name_idx","ID","NAME");
116+
117+
DataRow row = dataFrame.selectByPrimaryKey(1);
118+
System.out.println(row);
119+
//1;A;1
120+
121+
DataFrame idxExample = dataFrame.selectByIndex("id_name_idx",3,"A");
122+
123+
idxExample.print();
124+
/*
125+
ID NAME UID
126+
3 A 4
127+
3 A 8
128+
*/
129+
idxExample.getStringColumn("NAME").map((value -> value + "_idx_example"));
130+
idxExample.print();
131+
/*
132+
ID NAME UID
133+
3 A_idx_example 4
134+
3 A_idx_example 8
135+
*/
136+
137+
dataFrame.joinInner(idxExample,"UID").print();
138+
/*
139+
ID.A NAME.A UID ID.B NAME.B
140+
3 A 4 3 A_idx_example
141+
3 A 8 3 A_idx_example
142+
*/
143+
144+
```
145+
73146
Usage
74147
-----
75148
Load DataFrame from a CSV file.
@@ -123,20 +196,74 @@ File file = new File("dataFrame.csv");
123196
dataFrame.write(file);
124197
DataFrame loadedDataFrame = DataFrame.load(file);
125198
```
199+
200+
Values within a DataFrame are accessed using DataRow objects.
201+
If the source DataFrame changes after a DataRow object is created, the DataRow is invalidated and can no
202+
longer be accessed.
203+
204+
```java
205+
206+
for(DataRow row : dataFrame){
207+
... = row.getInteger("id");
208+
}
209+
210+
211+
DataRows rows = dataFrame.getRows();
212+
213+
//returns the value within the id column in the first row
214+
rows.get(0).getInteger("id");
215+
216+
dataFrame.sort("name");
217+
218+
//The DataFrame was sorted after the DataRows were obtained.
219+
//The first row can now differ.
220+
//To avoid these effects, a RuntimeException is thrown
221+
//if a row that was created before the DataFrame is altered is accessed
222+
223+
rows.get(0).getInteger("id"); //throws exception
224+
225+
rows = dataFrame.getRows();
226+
227+
//rows is now valid again and rows can be accessed
228+
rows.get(0).getInteger("id");
229+
230+
231+
//DataRows can be converted to a new independent DataFrame.
232+
//changes to the original DataFrame have no effect on the new DataFrame.
233+
DataFrame dataFrame2 = rows.toDataFrame();
234+
235+
dataFrame.sort("id");
236+
237+
dataFrame.getRow(0).getInteger("id"); // no exception
238+
```
239+
240+
DataRows can be used to change values within a DataFrame
241+
242+
```java
243+
DataRows rows = dataFrame.getRows();
244+
245+
//sets the value in the second row in the name column to 'A'
246+
rows.get(1).set("name","A");
247+
248+
//sets the value in the second row in the first column to 'A'
249+
rows.get(1).set(0,"A");
250+
251+
252+
```
253+
126254
Use indices for fast row access.
127-
Indices must always be unique.
128255
```java
129256

130257
//set the primary key of a data frame
131258
users.setPrimaryKey("person_id");
132-
DataRow firstUser = users.findByPrimaryKey(1)
259+
DataRow firstUser = users.selectByPrimaryKey(1)
133260

134261
//add a multi-column index
135262

136263
users.addIndex("name-address","last_name","address");
137264

138-
//returns all users with the last name Smith in the Example-Street 15
139-
List<DataRow> user = users.findByIndex("name-address","Smith","Example-Street 15")
265+
//returns rows containing all users with the last name Smith in the Example-Street 15
266+
DataRows user = users.selectRowsByIndex("name-address","Smith","Example-Street 15")
140267
```
141268
It is possible to define and use other index types.
142269
The following example shows interval indices.
@@ -157,17 +284,17 @@ IntervalIndex index = new IntervalIndex("idx",
157284
dataFrame.getNumberColumn("end"));
158285
dataFrame.addIndex(index);
159286

160-
//returns rows where (start,end) overlaps with (1,3)
287+
//returns a new dataframe containing all rows where (start,end) overlaps with (1,3)
161288
// -> A, B
162-
dataFrame.findByIndex("idx",1,3);
289+
DataFrame df = dataFrame.selectByIndex("idx",1,3);
163290

164-
//returns rows where (start,end) overlaps with (4,5)
291+
//rows where (start,end) overlaps with (4,5)
165292
// -> C
166-
dataFrame.findByIndex("idx",4,5);
293+
dataFrame.selectByIndex("idx",4,5);
167294

168-
//returns rows where (start,end) contains 2.5
295+
//rows where (start,end) contains 2.5
169296
// -> A, B
170-
dataFrame.findByIndex("idx",2.5);
297+
dataFrame.selectByIndex("idx",2.5);
171298
```
172299

173300
Perform operations on columns.

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>de.unknownreality</groupId>
88
<artifactId>dataframe</artifactId>
9-
<version>0.7.2-SNAPSHOT</version>
9+
<version>0.7.5-SNAPSHOT</version>
1010
<properties>
1111
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
1212
<maven.compiler.source>1.8</maven.compiler.source>

src/main/java/de/unknownreality/dataframe/AutodetectConverter.java

Lines changed: 0 additions & 117 deletions
This file was deleted.

src/main/java/de/unknownreality/dataframe/ColumnAppender.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
/**
2828
* Created by Alex on 13.03.2016.
2929
*/
30+
@FunctionalInterface
3031
public interface ColumnAppender<T extends Comparable<T>> {
3132
/**
3233
* Creates the value for a new column in a row

0 commit comments

Comments
 (0)